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A FRONTAL ATTACK ON THE BASIC 


PROBLEM IN EVALUATION: 


THE 


ACHIEVEMENT OF THE OBJECTIVES 
“oF INSTRUCTION IN SPECIFIC AREAS‘ 


KENNETH E. ANDERSON 


University of Kansas 
Lawrence, Kansas 


_ Educators have long busied themselves with 
: statements of objectives for the sec- 
: school and for specific areas or sub- 
jects in the curriculum. Few educators have 
d to relate these objectives to second- 
ty school practices in a realistic manner. 
ce educators are not excepted. Out of the 
literature there has emerged a fairly 
-cut body of objectives. Mere lip service 
“iregard to these objectives is hardly suffic- 
to produce a realistic program of instruc- 
which should: (1) include continuous meas- 
and evaluation; (2) utilize those fac- 
in the teaching situation which contribute 
to student achievement of the objectives; 
(3) demand rigorous research in and inves- 
on of science instruction to the end that it 
be improved. The writer believed that 
e education needed a study which would 
wor to: (1) alert science educators to con- 
pus measurement and evaluation; (2) reveal 
factors in the teaching situation which 
d contribute most to the achievement of ob- 
lives; and (3) be rigorous in its design and 
hod. The present study, therefore, was de- 
d not as a flanking movement, but as a 
attack on the basic problem in evalua- 
; the achievement of the objectives of in- 
tion, or specifically, the relative achieve- 
A ts of the objectives of science instruction 
a a representative sampling of schools. 


Mie 


z lement of the Problem 

ae The problem was to determine the present 

Matus of science instruction in two fields in 

"Minnesota secondary schools and to find what 
» “tors inherent in the pupil or in the teaching 








situation make for a better realization of the 
objectives of science instruction. The study 
was confined to the two secondary school sci- 
ences: biology and chemistry. 

The objectives of science instruction as re- 
ported in the literature on science education 
were coalesced into four about which the entire 
study was pivoted. These objectives were: ac- 
quisition of factual information in science, the 
understanding of the principles of science, the 
understanding and use of the scientific method, 
and the acquisition of scientific attitudes. 

Specifically, the present study had three 
purposes: (1) to determine those factors in the 
teaching situation which contribute most to the 
achievement of the objectives of science instruc- 
tion; (2) to determine the relative contributions 
of factual information, understanding of princi- 
ples, scientific attitudes, and intelligence, to 
the understanding of the scientific method; and 
(3) to describe the current practices in the 
teaching of biology and chemistry and to de- 
scribe the persons who taught these subjects in 
terms of preparation, experience, teaching 
load, teaching objectives, and professional ac- 
tivities. 


Selection of the Sample 





In order to draw conclusions concerning sci- 
ence teaching for the State of Minnesota, it was 
imperative that the schools participating in the 
study (the sample) be representative of the high 
schools of Minnesota (the population). In order 
to increase the accuracy and representativeness 
of the sample the method of stratified sampling 
(random samples from groups) was applied. In 
Stratified sampling the population to be sampled 





© * A summary of the Relative Achievement of the Objectives of Secondary School 





© Science in a Representative Sampling of Fifty-Six Minnesota Schools, 
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is divided into groups or strata. Different num- 
bers, proportional to the total numbers in each 
stratum in the population, are then selected 
from each stratum by some process of strict 
random selection within each stratum. If the 
strata are so taken that each constitutes a rel- 
atively homogeneous group, the accuracy of the 
sample will be considerably increased, because 
each stratum is represented in the correct por- 
tion in the sample. 

The application of this sampling method to 
the problem proceeded as follows: 

Step 1--In order to determine the stratafrom 
which each sample was drawn, it was necessary 
to determine the population of 461 population 
centers maintaining high schools in Minnesota 
in 1944-45. This was done by consulting the 
United States census figures for 1940. Thus 
there were 41 cities maintaining high schools 
in population centers under 5000. Therefore it 
was decided to stratify the 483 high schools ac- 
cording to the following three categories: 

Schools located in 

1. centers under 5000 people (420 high 
schools) 
2. centers of 5000 people or more (38 
high schools) 
3. the three big cities (25 high schools) 
The school was, therefore, the sampling unit. 

Step 2--The second step was to select a sam- 
ple of schools in proportion to the numbers 420: 
38:25. Thus 87 percent of the schools should 
come from population centers under 5000 people, 
eight percent from population centers of 5000 
people or more, and five percent from the three 
big cities. 

Step 3--Then, by using Tippett’s ‘‘Random 
Sampling Numbers’’, schools were drawn ac- 
cording to the following procedure: An attempt 
was made to draw schools as closely as possible 
to the ratio of 87:8:5. Schools were contacted by 
letter asking them to participate. One restric- 
tion, that they offer both biology and chemistry, 
was imposed on the schools to whom the letters 
were sent. With few exceptions in the larger 
sized towns, the schools chosen by random sel- 
ection did participate in the study. The final 
sample drawn contained 56 schools offering both 
biology and chemistry. The total number of 
students involved in the biology portion of the 
study was 1980 and the total number of students 
involved in the chemistry portion of the study 
was 1352. It must be pointed out that there was 
a slight disproportionality of schools from the 
larger cities in the total sample. This must be 
taken into account when the generalizations of 
the study are considered. The ratio thus be- 
came 86:10:4, whereas the planned ratio was 





87:8:5. 


Information Secured from Schools | 





Data were secured from the fifty-six schools 
as follows: 


1. Examination scores for each pupil on the 
pre-test, plus information from each pupil as 
to sex, age, science and mathematic courses 
taken, and college plans. 

2. Intelligence test scores by using the ‘‘Qtis 
Quick~-Scoring Mental Ability Tests, Gamma 
Test: Form AM. ”’ 

3. Completed schedules from each of the 
teachers. The schedule consisted of fifty-eight 
items calling for quantitative data which could 
be used in making statistical comparisons. 
Some of the items called for information of a 
subjective kind such as ‘‘What do you feel could 
be done to improve science instruction in Min- 
nesota schools?’’ 

4. Final examination scores on the State 


‘Board Examinations constructed by the writer, 


The Examinations 





One of the problems in the present study was 
the construction of examinations in biology and 
chemistry that might be found valid and reliable. 
Extreme care was taken to insure that the ex- 
aminations measured what they purported to 
measure, and did so consistently. The use of 
external criteria such as elements of the scien- 
tific method, lists of biological and chemical 
principles, statements of scientific attitudes, 
and the use of item analysis techniques, aided 
in the establishment of valid items. Statistical 
estimates of validity were determined by using 
randomly selected schools and obtaining correl- 
ations between the final examination scores and 
teacher’s estimate of final grade and three es- 
tablished tests in the field. All of these correl- 
ations were not high, but all were significant at 
the 1 percent level. The reliability of the parts 
and of the total for each examination was deter- 
mined on the final examination scores by using 
Hoyt’s method of analysis of variance. The re- 
liabilities of the parts, although not high in all 
instances, were significant at the 1 percent lev- 
el. Both tests had a reliability of over . 90 for 
all four parts combined. As the total score was 
used in all the analyses but one, the reliability 
of .90 was sufficiently high for purposes of com- 
parison. 


Distributions of Part Scores and Total Scores 





Application of Chi-square test to the total 
distributions revealed that the total score dis- 
tributions in biology and chemistry departed 
from normality. Histograms, together with 
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data concerning kurtosis and skewness, indica- 
ted that the part distributions and the total dis- 
tributions were uni-modal and exhibited only a 
departure from normality. Fisher! has 
shown that for curves that exhibit only a mod- 













































on the erate departure from normality, the efficiency 
il as remains reasonably high. Also, in using the 
‘Ses alysis of variance and covariance, the assump- 
tions basic to this technique were tested. Also, 
“Otis for the t-test and the F-test used later in the 
ma gnalysis, it has been shown that no serious er- 
ror for a slight departure from normality is in- 
ie troduced in the significance levels. 2 
“eight 
uld Typical Comparisons and Statistical Tools Used 
fa Fourteen comparisons were made in biology 
could and fifteen comparisons were made in chemistry 
fin- using the technique of analysis of variance and 
covariance whenever possible. Space does not 
, permit a complete description of all of these 
ter. comparisons but in general, the comparisons 
were made on the basis of the end scores, hold- 
ing intelligence and pre-test knowledge constant, 
using the technique of analysis of variance and 
Was covariance. Whenever it was not possible to 
and use the above technique, the d-test of Behrens 
able. and Fisher was used. Wherever possible the 
x upper one-fourth of the distribution of schools 
o was compared with the lower one-fourth of the 
ot distribution. The number of schools in each 
ien- group was further reduced by one-fourth and the 
. schools in the reduced sample were chosen pro- 
, portionately by random means. In comparisons 
~d where the upper one-fourth could not be com- 
ral pared with the lower one-fourth, the proportion 
ng of schools as they occurred in the original pop- 
‘el- ulation in any one comparison was reduced by 
nd one-fourth. The schools chosen for the reduced 
= sample were chosen by random means. The fol- 
el- lowing sections typify the newer statistical tools 
> used in this study in the comparisons made. 
r- Analysis of Variance and Covariance 
x 
e- In one comparison for biology, there were 
Hl thirteen teachers who had taken 77 or more 
v- quarter hours of science in college (Group A), 
r and thirteen teachers who had taken 32 or less 
as quarter hours of science in college (Group B). 
y The ratio 13:13 was reduced to 3:3. The elem- 
m- ents in the reduced sample were again selected 





bya random process. The hypothesis to be 
tested was that there was no significant differ- 
ence in mean pupil achievement for students 

having teachers in the upper one-fourth of the 
teacher group in total hours of college science 
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as compared with students having teachers in 
the lower one-fourth of the teacher group in 
total hours of college science. 

Before the pupil data (intelligence test scores, 
pre-test scores, and total final scores) in the 
three classes of pupils (Group A) could be pooled, 
and before the pupil data in the three classes of 
pupils (Group B) could be pooled, two assump- 
tions had to be fulfilled: 

1. That there was no difference between the 
groups to be pooled in regard to standard devi- 
ations. 

2. That there was no difference between the 
groups to be pooled in regard to means. 

The first assumption was tested by using the 
Welch-Nayer test on the ‘‘sum of squares with- 
in groups, ’’ and the second assumption was 
tested by using the F-test, in which F was found 
by dividing the mean square between groups by 
the mean square within groups. The F was 
found by using the analysis of variance which as- 
sumes equality of variances. This equality was 
tested by the previously used Welch-Nayer test. 
Tables I and II iliustrate these tests for Group 
B. By using these tests we concluded that there 
was no significant difference between the three 
classes to be pooled in the two groups in regard 
to standard deviations and in regard to means. 

We were now in a position to test the hypoth- 
esis that there was no difference between the 
two contrasted groups with regard to means on 
the end scores, holding intelligence and pre- 
test scores constant. The assumptions which 
had to be satisfied before the analysis of vari- 
ance and covariance tool could be applied were: 

1. That there was no difference between stan- 
dard deviations of the first pooled group (Group 
A) and the second pooled group (Group B). 

2. That there was no difference between the 
within partial regression coefficients of thefirst 
pooled group (Group A) and the second pooled 
group (Group B). 


Both of these hypotheses were tested by using 
the Welch-Nayer test. The first was tested by 
using the ‘‘sums of squares within groups’’ using 
the final or end scores only. The second hypoth- 
esis was tested by using the adjusted ‘‘sums of 
squares within groups.’’ We concluded that the 
two pooled groups had satisfied the assumptions 
basic to the application of the techniq ue of anal- 
ysis of variance and covariance. We were now 
in a position to analyse results and determine 
the F ratio for the respective pooled groups. 
This is illustrated in Table III. An F of 8. 52 
was obtained. Entering Snedecor’s table of F 
with n, = 1 and n, = 174, we found that our value 








1. R. A. Fisher. 











"On the Mathematical Foundations of Theoretical Statistics," 
Philosophical Transactions of the Royal Society of London, A, 222 (1922), 
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Variance Are Not Satisfied, 





"Some Consequences When the Assumptions for the Analysis of 
Biometrics, (March 1947). 
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of F was greater than the table value at the 1 
percent level. We therefore rejected the hy- 
pothesis that there was no difference between 
the two pooled groups with regard to means on 
the end score, holding pre-test and intelligence 
test scores constant. 

Which pooled group achieved significantly 
more on the biology test, holding intelligence 
and pre-test scores constant? In order to de- 
termine this, it was necessary to calculate ad- 
justed means. T his is illustrated in Table IV. 
This was done by adding or subtracting a cor- 
rection from the mean of the final score. This 
was done for the first pooled group andthe sec- 
ond pooled group. The first correction to be 
added or subtracted was determined by taking 
the difference between the mean of the inteili- 
gence test scores for each group and that of the 
grand mean, both groups combined. This dif- 
ference was then multiplied by the b, (obtained 
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transformations were not used. The Behrens- 
Fisher d-criterion was used on the end scores 
only. The d formula and substitution in this 
formula are shown below: 














~X 
* (eh Zt, - 3 
N, N, - N.(N, - + 
54.76 - 52.07 
| 30420. 2832 , 22586. 5149 
157(157 -1)  101(101 - 1) 
nm 2.69 - 2.69 .1.$5 
V3. 017371 1.737 


The value of d of 1.55 was not significant and 








from the adjustment table) for within grades. 

The second correction for the pre-test scores 
was done in the same manner except the differ - 
ences were multiplied by b, (obtained from the 
adjustment table) for within grades. Thus, we 
were able to conclude that on the average tne 
students of teachers in the upper one-fourth in 
terms of quarter hours of science taken in col- 
lege, achieved significantly more on the biology 
test providing the factors of intelligence and pre- 
test information were partialled out, than did the 
students of teachers in the lower one-fourth in 
terms of quarter hours of science taken in col- 


lege. 

Subsequent Tests Used if the Assumptions Basic 
to Analysis of Variance and Covariance Could 
Not Be Satisfied 








In the event that the assumption for homogen- 
eity of variances was not satisfied, the writer 
turned to the Behrens-Fisher test or d-test. The 
d-test does not correct for covariance effects. 
The d-test does not require that the variances 
should be homogeneous, and should be used in- 
stead of the t-test when the variances are unequal. 
In every instance where the d-test was applied, 
it was done on the end scores only and not onthe 
gain between initial and final scores. In the 
comparison on experience of biology teachers, 
those teachers in the upper one-fourth in terms 
of years of experience teaching biology, had 
eight or more years of experience. Teachers 
in the lower one-fourth had one or less years 
of experience. The hypothesis to be tested was 
that there was no difference in achievement on 
the biology test between the two groups holding 
intelligence and pre-test knowledge constant. 
The two groups, however, were not homogen- 
eous with respect to variances. Therefore, the 
technique of analysis of variance and covariance 
was not used on the raw scores. In such cases 
transformation of the raw scores may often 
make possible the analysis. However, here, 


the hypothesis was accepted, that there was no 
difference between the means of the two groups. 
The intelligence and pre-test knowledge were 
not held constant in using the d-test. Wherever 
the d value proved to be significant, a d-test 
was run on the intelligence test scores and un 
the pre-test scores, in order to further sub- 

| stantiate the conclusion. 


Subsequent t-Tests 





In the event that a significant F ratio were 
found by using the analysis of variance and co- 
variance, and if the number of groups being 
compared were three or more, it became nec- 
essary to determine whether or not the differ- 
ence was betweenGroupA andGroupB, between 
Group B and Group C, or between Group A and 
Group C. In the comparison involving graduates 
from universities, from teachers colleges, and 
from private colleges, an F ratio of 79. 36 was 
obtained. The hypothesis that there was no dif- 
ference in the mean achievement of students in 
these three classifications, holding intelligence 
and pre-test knowledge constant, was rejected. 
Was the difference between each of the group 
comparisons significant or was it between just 
two groups? It was necessary to calculate 
three t-tests as follows, using this formula for 
the standard error of a mean difference: 


| s?{ Ni+ Ney 
N, Nz 


| The three t-tests were as follows: 








U?B - 2UVP + U2A 
AB - P2 


| to (A and B) = 49.567 - 48.573 = . 853 
1. 165 
to (C and A) = 52420 - 48.573 = 4. 011° 


. 959 
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52. 420 - 49. 567 = 2. 761* 
1. 033 


to (C and B) 


* Significant at the 1 percent level. 


The degrees of freedom for the above were 215, 
340, and 339 respectively. In such cases where 
the means of selected samples are being com- 
pared, instead of tne usual probability levels, 
the following level is used: 1:3(100). Since 


there are three groups there are (3)(2) or (3) 
2 


possible combinations of two's. Since the lat- 
ter values of t reported have corresponding 
probability values less than 1 in 300, they are 
judged significant at the 1 percent level. Onthe 
pasis of the above t-tests, we were able to 

draw the following conclusions: 

1. That there was no statistically significant 
difference in mean achievement of the students 
between Group A and Group C, holding intelli- 
gence and pre-test information constant. 

2. That there was a statistically significant 
difference in mean achievement of the students 
between Group A and Group C, holding intelli- 
gence and pre-test information constant. 

3. That there was a statistically significant 
difference in mean achievement of the students 
between Group B and Group C, holding intelli- 
gence and pre-test information constant. Thus, 
students who had teachers from private colleges 
achieved on the average significantly more in 
biology than did students who had teachers from 
universities and teachers colleges, if the intel- 
ligence and the pre-test knowledge of the stu- 
dents were held constant. Perhaps other fac- 


tors were in operation to produce the difference, 


but we were unable to state what those factors 
might be. 


Findings 


The findings of this study fall into three cat- 
egories, those obtained: (1) by means of the 
comparisons outlined above, (2) from the inter- 
correlations and multiple correlations, and 
(3) by the analysis of the teacher schedule. 
Tables V, VI, VII, and VIII show the results of 
the fourteen comparisons in biology and the fif- 
teen comparisons in chemistry. 


Comparisons Resulting in Significant Values 





Tables V, VI, VII, and VIII show the results 
of the fourteen comparisons in biology and the 
fifteen comparisons in chemistry. 

The results of the several comparisons re- 
vealed that on the average, students achieved 
Significantly more in biology, holding constant 
the factors of pre-test knowledge and intelli- 
gence, when: (1) the teacher was in the upper 
one-fourth of the distribution in terms of quar- 
ter hours of college science earned, (2) the 
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teacher had graduated from a private college 
rather than from a university or teachers col- 
lege, (3) the teacher had a Master’s degree, 
(4) the number of laboratory hours received by 
the students was in the upper one-fourth of the 
state distribution, and (5) the students were in 
a class the size of which was in the upper one- 
fourth of the state distribution. 

The results of the several comparisons in- 
dicated that on the average students achieved 
significantly more in chemistry, holding con- 
stant the factors of pre-test knowledge and in- 
telligence, when: (1) the teacher was in the up- 
per one-fourth of the distribution in terms of 
quarter hours of college chemistry earned, (2) 
the students used a laboratory manual, (3) the 
students elected chemistry, (4) the number of 
laboratory hours received by the students was 
in the upper one-fourth of the state distribution, 
(9) the teacher had graduated from a university 
or private college rather than a teachers col- 
lege, and (6) the students were in a class the 
size of which was in the upper one-fourth of the 
State distribution. 

Several additional comparisons in chemistry 
were not significant at the 1 percent level or 
were not significant at this level in all sub- 
comparisons when a particular comparison had 
to be broken up into one or more sub-compar- 
sions. The following tentative conclusions were 
drawn at the 5 percent level of significance or 
at the I percent level in one of the sub-compar- 
isons: that on the average, students achieved 
significantly more in chemistry, holding con- 
stant the factors of pre-test knowledge and in- 
telligence, when (1) the students were in a large 
sized high school or medium sized high school 
rather than in a small sized high school (5 per- 
cent level), (2) the teacher had ten or more 
years of experience teaching chemistry or when 
the teacher was in the upper one-fourth of that 
distribution (5 percent level), (3) the teacher had 
one or two preparations rather than six differ - 
ent preparations per day (1 percent level in one 
of the three sub-comparisons), and (4) the tea- 
cher’s knowledge of the scientific method placed 
her in the upper one-fourth of that distribution 
(1 percent level in one of the two sub-compari- 
sons). 

Two of the comparisons did not lend them- 
selves to treatment by means of the technique 
of analysis of variance and covariance. There- 
fore, the d-test was applied, using the end scores 
only. In addition, d-tests applied to the pre- 
test scores and the intelligence test scores, in- 
dicated that the two groups to be compared were 
Significantly different as regards these two fac- 
tors. Thus, we were able to draw only the fol- 
lowing tentative conclusions, that on the average 
Students achieved significantly more in chem- 
istry, when: (1) they planned to go on to college, 
and (2) the teacher was in the upper one-fourth 
of the distribution in terms of total quarter 
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TABLE V 


BIOLOGY COMPARISONS-ANALYSIS OF VARIANCE AND COVARIANCE-INTELLIGENCE PReE- 
TEST SCORES HELD CONSTANT 





Adjusted Means | t-Tests Using 
if F was Adjusted Means 


Comparison 
Significant 





Sex 


Size of school: large, 
medium, small 


Number of different prep- 
arations: 6 vs. 1-2 


Biology credits earned by 
teachers: 46 or more vs. 
16 or less 


Science credits earned by : 
teachers: 77 or more vs. upper ¢ 50.19 
32 or less lower 2 46. 06 


Laboratory manual vs. no 
laboratory manual 


Lab. preceded class dis- 
cussion vs. lab. followed 
class discussion 


Elective vs. required 


Teacher’s score on scientific 
method: 8 or more points vs. 
3 or less 


Number of lab. hours: 60 or upper @ 55. 08 
more vs. 12 or less lower é 52.34 


Undergraduate degree: univ. univ. 48.57 univ. -tchrs. . 853 
vs. tchrs. college vs. pri- tchrs. 49.57 priv. -univ. 4. 011* 
vate college priv. 52.42 priv. -tchrs. 2. 761* 


Master’s degree vs. none Mast’rs53. 49 
none 50.10 


Class size: 29 or more vs. upper 2 50. 87 
17 or less lower 2 45.79 

















* Significant at 1 percent level. 
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TABLE VI 


BIOLOGY COMPARISON: USING THE END SCORES ONLY—d TEST 








Ratio of 
Variances 
of Means Means 


Experience: 8 or more upper 4 54. 76 
yrs. vs. lor less yrs. . 590 lower 2 52.07 


Comparison 























TABLE VII 


CHEMISTRY COMPARISONS-ANALYSIS OF VARIANCE AND COVARIANCE-INTELLIGENCE AND 
PRE-TEST SCORES HELD CONSTANT 





Adjusted Means t-Tests Using 
if F was Adjusted Means 


Comparison 
Significant 





Sex 


Size of school: large, large 47. large-med. . 873 
medium, small » med. 46. large-small 2. 066* 
small 43. med.-small 2. 103* 


Number of different 6 prep. 47. 
preparations: 6 vs. 1-2 . 1-2 prep55. 


Number of different 
preparations: 6 vs. 1-2 


Chemistry credits earn 
ed by teachers: 35 or 
more vs. 13 or less 


Experience: ten or 
more years vs. one upper 
or less , lower 





Laboratory manual none 
vs. no lab. manual a manual 


Lab. preceded class 
discussion v. lab. 
followed class discus- 
sion 








Elective vs. required : Elective 41 
Required34 











Continued— 
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TABLE VII (Continued) 








Comparison 


Adjusted Means 
if F was 
Significant 


——— 
—=—= 


t-Tests Using 
Adjusted Means 





iI-L 


Teacher’s score on 
scientific method: 9 or 
more points vs. 4 or 
less 


Teacher's score on 
scient. method: 9 or 
more points vs. 4 or 
less 


Number of lab. hours: 
74 or more vs. 34 or 
less 


Undergraduate degree: 
univ. vs. tchrs. coll. , 
vs. priv. coll. 


Class size: 25 or more 
vs. 16 or less 





1-108 





12. 55** 





upper ¢ 15.78 
lower 4 12.85 


upper ¢ 38.13 
lower 2 30.54 


39. 31 
35. 81 
41.18 


univ. 
tchrs. 
priv. 


upper ¢ 42.17 
lower 2 39. 89 


univ. -tchrs. 
univ. -priv. 
priv. -tchrs. 


2. 010° 
1. 286 
3. 120** 








* Significant at the 5 percent level. 
** Significant at the 1 percent level. 


TABLE VII 


CHEMISTRY COMPARISONS: USING THE END SCORES ONLY—d TEST 





Comparison 


ny 


| tio o 
Variances 


Nz of Means 


Means 





Educational plans: 
College-bound vs. 
terminal 


Science credits 


earned by teachers: 


80 or more vs. 40 
or less 


5. 181* 





70 





pe 
| 


| 216 
| 





. 770 


; 





terminal 40. 66 
college-bound 48.75 


50. 85 
38. 52 


upper ¢ 
lower 2 





* Significant at the 1 percent level. 
Note: d tests snowed that the two groups in I-B and III-C differed significantly as regards 
pre-test scores and intelligence test scores. 
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pours of college science earned. 





Intercorrelations and Multiple Correlations 
Intercorre® 


The intercorrelations of the parts of each 
examination with each other revealed that the 
ability to understand and apply the scientific 
method in biological situations was accompanied 
chiefly by the abilities to acquire facts and prin- 
ciples, and that the ability to understand and 
apply the scientific method in chemical situa- 
tions was accompanied chiefly by the ability to 
acquire facts. The intercorrelations for both 
subjects showed that the ability to acquire sci- 
entific attitudes as measured in these examin- 
ations was not highly related to the other abil- 
ities or to intellectual ability as measured. 

The multiple correlations indicated that 53 

ent of the variance in the measures of sci- 
entific method had been accounted for in biology 
and that 64 percent of the variance had been 
accounted for in chemistry. The percentage of 
influence of each of the factors as revealed by 
the squares of the standard partial regression 
coefficients indicated that in both biology and 
chemistry, intellectual ability contributed the 
most to the understandi ng and use of the scien- 
fific method, and of the remaining variables 
factual information in chemistry and the under- 
standing of principles in biology contributed 
most to the understanding and application of the 
scientific method. 


Analysis of the Teacher Schedule 





The teacher schedule served two purposes. 
It furnished: (1) quantified data used as a basis 
for the comparisons described previously, and 
(2) a description of the current practices in the 
teaching of biology and chemistry. 

The 1946 teacher was a younger teacher in 
terms of experience and a less highly special- 
ized but more comprehensively trained one in 
terms of science preparation in college as com- 
pared with the science teacher in 1923 and in 
1936. The median number of quarter hours of 
college science earned was 50.5 and 53.9 for 
biology and chemistry teachers respectively. 

The teachers as a group were lacking in an 
understanding of the scientific method. Once 
out of college they had read few professional 
books on science and science teaching, and less 
than half of them had attended professional meet- 
ings related to science teaching. Of the 91 tea- 
chers of biology and chemistry, 26 percent, 36 
percent, and 38 percent received their under- 
graduate degrees from universities, teachers 
colleges, and private colleges, respectively, 
and 15 percent of these had a Master’s degree. 

Teachers of biology had slightly heavier loads 
in terms of the number of different kinds of pre- 
paration (a teacher having four different sub- 
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chemistry teacher having about four prepara- 
tions per day and about 100 pupil contacts per 
day. The median size class was slightly over 
20 pupils for both types of teachers. The model 
period of instruction was fifty minutes in length. 
The teachers of science in this study were in 
the main dependent on laboratory manuals for 
their laboratory instruction in science, although 
they used a variety of other procedures. More 
teachers of chemistry used laboratory manuals 
than did the biology teachers. Most of the lab- 
oratory instruction accompanied class work and 
the demonstration was used as a supplement to 
individual or group laboratory exercises. Chem- 
istry teachers rated their laboratory supplies 
higher in terms of quality and amount than was 
true in the case of biology. Most of the teach- 
ers indicated that their laboratories were equip- 
ped with sources of gas and electricity. All of 
the teachers indicated that their laboratories 
were supplied with running water, but this was 
the only equipment or supply so indicated. Al- 
though the teachers indicated that the develop- 
ment of skill in the use of apparatus was one of 
the chief functions of the science laboratory, it 
was encouraging to note that careful observation, 
understanding and use of the scientific method, 
development of scientific attitudes, and an un- 
derstanding of principles, were functions of the 
science laboratory listed among the first ten 
functions. The median number of laboratory 
hours of instruction received per year was 33 
in biology and 60 in chemistry or slightly less 
than the number recommended by the teachers. 

The teachers were not making full use of aids 
to teaching such as the science club, the field 
trip, and visual aid of the projected type. Only 
12 percent of the teachers sponsored a science 
club and a bare majority took their students on 
planned field trips. The standard slide projec- 
tor, filmstrip projector, and sound movie pro- 
jector were the pieces of equipment most fre- 
quently available and these were used from five 
to ten times per year. The teachers indicated 
a fair supply of science books in the library. 
The median number of magazines on science 
available to students was less than two. 

A majority of the schools indicated that biol- 
ogy was a required subject while less than ten 
percent indicated that chemistry was a required 
subject. 

The teachers as a group had little to offer as 
regards procedures for developing an under- 
standing of principles, the scientific method, 
and scientific attitudes on the part of students. 
Demonstrations by the teacher, following the 
steps in the scientific method, and experiments 
in the laboratory, were the most frequently 
mentioned procedures indicated for each of the 
areas mentioned above. In general, the teach- 
ers of biology and chemistry provided no differ- 
entiation of instruction for those going on to col- 





jects per day would have four preparations), the 





lege, and when they did it was largely quanti- 





174 JOURNAL OF EXPERIMENTAL EDUCATION 


tative in nature. 


Interpretation and Implications 





A review of the literature on science educa- 
tion revealed that many of the former studies 
treated one or more.aspects of the problem 
under consideration. No one study was similar 
in problem or scope to the present one. The 
chief weaknesses in the past studies were the 
use of matching techniques, now obsolete, and 
the failure to obtain a truly representative sam- 
pling for purposes of statistical treatment. 
Therefore, the present study placed particular 
emphasis on using more efficient statistical 
tools, and securing a sample of schools that 
might be truly representative. 

In the comparisons described, two factors 
were held constant, namely, student intelligence 
and pre-test knowledge of the student. It might 
be that the significance or non-significance ob- 
tained in any one comparison might in part be 
due to the contribution of a factor or factors not 
controlled in the comparisons. Every precau- 
tion was taken in regard to representativeness 
of the sample and fulfillment of assumptions 
underlying the statistical methods used. There- 
fore, considerable confidence can be placed in 
the results obtained. 


The results of the analysis of the teacher 
schedule indicated that science instruction in 
Minnesota high schools is in need of improve- 
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ment. Numerous suggestions were made by 
teachers; some of these warrant serious con- 
sideration by educators in colleges and univer- 
sities directly concerned with the preparation 
of science teachers. 

Statistical analysis of the test data revealeg 
those factors which were significant in student 
achievement in science. This knowledge, coj- 
lege instructors, state officers of education, 
and administrators, can utilize in part if not ip 
total in considering problems in science educa- 
ation. 

The results of research in science education 
should be made available to those directly con- 
cerned with their utilization to the purpose that 
boys and girls may be more adequately trained 
in the sciences, not only for the precise purpose 
of training future scientists, but with the broad- 
er objective of opening new worlds of thought and 
endeavor to the average citizen of tomorrow. 
With such purpose in view, the findings and in- 
terpretations of this study were sent to each of 
the teachers cooperating in this study, and to 
some fifty leading science educators. 

Thus, this study has outlined a careful sys- 
tematic way of appraising the growth of students 
toward each of several fundamental objectives 
in two sciences over a year’s interval of time. 
In other words, it constitutes a frontal attack 
on the basic problem of evaluation. It is con- 
ceivable that the method, if followed carefully, 
might prove to be fruitful in areas other than 
science. 
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con- 
Niver- 
ation 
ealed 
‘ud ent 
col- 
On, 
Not in 
duca- 
>ation 
con- 
> that Essentially, factor analysis presents a 
ned statistical method of summarizing a matrix of 
Ir pose intercorrelations in such a way as to describe 
Toad- each of the variables in terms of a limited num- 
rht and ber of assumed factors. An important end- 
w. product of the analysis, therefore, is a set of 
1 in- linear equations describing each of the variables 
h of in terms of the assumed factors. This set of 
to equations, which may be regarded as multiple 
regression equations, will be called a factor 
ys- pattern, following Holzinger.1 The analysis of 
Jents acorrelation matrix, with communalities in 
es the diagonals, may be made in terms of either 
ae, correlated or uncorrelated common factors. 
ick The latter solution is called an orthogonal one, 
a- and is illustrated by such well-known types of 
ly, solution as the bi-factor, the centroid, and the 
a principal-factor solution. A solution in terms 
of correlated common factors is called an ob- 
lique solution. It is this type of solution that 
Thurstone emphasizes in his recent text. 2 
The procedure for computing an oblique sol- 
ution that has been followed most commonly is 
that of first calculating an orthogonal solution, 
= such as the centroid, and then rotating this in- 
itial orthogonal solution to the desired oblique 
solution. In such a procedure the calculation 
ms of the initial orthogonal solution serves prim- 
"a arily as a method of estimating the number of 
_ common factors, or minimum rank of the given 
correlation matrix, R, and of determining the 
2. communalities. Any initial solution that yields 
9). the same number of common factors and the 


same communalities would serve equally well 
as this intermediate stage in developing the de- 

sired oblique solution. It, therefore, is evident 
that the initial orthogonal solution might be re- 
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placed by an initial oblique solution in this pro- 
cedure. This opens the possibility of combin- 
ing certain earlier developments in factor meth- 
od to yield a modified procedure for arriving 

at the desired oblique solution. 

Holzinger has shown explicitly that the de- 
sired oblique solution with m common factors 
may be calculated quite simply if the variables 
whose correlations make up the matrix can be 
grouped into exactly m distinct clusters. 3 This 
method is essentially one of sectioning the cor- 
relation matrix into groups of variables of ap- 
proximate unit rank, and then passing axes 
through these groups, or clusters. The com- 
putation provides for determining the correla- 
tion of each variable with each of these axes 
(the structure matrix, S) and the intercorrela- 
tions, ®, of the axes or factors.4 As Holzinger 
shows, the oblique pattern, P, is given by: 


SO- =P. (1) 


He also shows that the adequacy ofthe fit of the 
solution can be tested by reproducing the cor- 
relation matrix, R, thus: 





















SP' = R*, (2) 


and determining the residuals. This methodis 
extremely useful for securing the desired ob- 
lique solution with a minimum of calculation, 
providing m groups of variables that constitute 
distinct clusters can be identified from the val- 
ues in R. 

For many correlation matrices, however, 
distinct clusters may not be readily identifiable. 
Thurstone has proposed that this same method 
of sectioning R and passing axes through clus- 
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is therefore readily apparent. 


"A Simvle Method of Factor Analysis," Psychometrika, 


The vrinciple involved in computing 3 is the familiar one of correlating ea 
variable with a group of variables; the relationship to the centroid method 
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ters of tests be employed generally to calculate 
an arbitrary oblique structure, and that this ob- 
lique structure then be transformed into an or- 
thogonal solution that can be taken as the in- 
itial solution for rotation purposes.5 This pro- 
cedure emphasizes the arbitrary oblique struc- 
ture as an intermediate stage in the calculation 
of the initial orthogonal solution. Schematically 
this procedure might be represented as: 


R——>-0 ———_F ———»Primary Solution, 


where R designates the correlation matrix, O 
the arbitrary oblique solution, and F the ortho- 
gonal solution. The modification in procedure 
that is suggested in the present paper is that 

of omitting the calculation of F by devising a 
procedure for rotating the initial oblique solu- 
tion directly to the desired primary solution 

In following this proposed procedure, the 
first problem is that of calculating an arbitrary 
oblique solution for which the number of factors, 
m, equals the minimum rank of R, and which 
meets the requirement of reproducing the cor- 
relation matrix within the desired limits of 
Sampling error. The second operation is that 
of determining the new coordinate system that 
will yield the desired oblique solution. The 
method proposed here is based on principles 
outlined earlier.6 The final operation is that 
of making the rotation from the given set of ob- 
lique axes to the new set. The calculation rou- 
tine will be illustrated in the body of the paper. 
An appendix contains a matrix formulation of 
the multiple-group method and of the rotation 
procedure. This will be of interest primarily 
to those who wish to examine the mathematical 
bases of these operations. 

A fictitious example has been prepared to il- 
lustrate the proposed method of analysis. Such 
an example has the advantage of permitting a 
clear demonstration that the method will repro- 
duce the assumed factor pattern. The correl- 
ation matrix given in Table I was constructed 
by specifying the primary pattern, Pp, and the 
intercorrelations of the axes, @p, and perform- 
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ing the matrix multiplication: 


Poop P,=R (3) 


Since the communalities ordinarily are not 
known, the diagonal elements of R have been 
omitted from Table I. 


Calculating the initial oblique solution: 


The first operation is that of calculating ap 
initial oblique solution that will reproduce the 
correlation matrix within the desired limits of 
sampling error. Two related problems must 
be solved. One is the determination of the nuy- 
ber of common factors to be extracted; the 
other is the securing of satisfactory estimates 
of the communalities. As trial estimates of 
the communality, the highest correlation in 
each column of R has been taken. These are 
written in a row below R in Table lL. In order 
to examine the appearance of the test configu- 
ration and derive an hypothesis regarding the 
number of factors, three arbitrary groups of 
variables will be summed and the resulting mg- 
trix extended and plotted. 

The variables are grouped arbitrarily as 
follows: 

Group A,: 1 
Group B,: 5 
GroupC,: 1 


2, 3, 4 
, 6, 7, 8, 
0, 11, 12, 


9 
1 


3, 14 


The matrix Gj in Table I gives the sums of the 
correlations of each variable with each group. 
The rows designate the three groups, and the 
columns the variables. Thus, the second en- 
try in the first row is the sum of three correl- 
ations, .63, .58, and .38, plus .65, which is 
the trial estimate of the communality of vari- 
able 2. The second entry in the second row is 
the sum of the correlation of variable 2 with 
each of the five variables of Group B. The ma- 
trix of G' can now be extended.7 This is done 
by dividing the entries in each column by the 
entry in the first row of that column. These 
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trix," Psychometrika, X (June 1945), pp. 73-78. Apparently at the time of 
publication of hie vaper, Thuretone did not recognize that Holzinger's ear- 
lier vaver had outlined this method of calculating an oblique structure. 
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, XXXIX (December 1948), pp. 449-468; 


ent art- 


Journal of 
and ojections 


cational Psychol 
of Three Types of Factor Pattern," Journal of Experimental Education, XVII 


(March 1949), pp. 335-345. 
See Thurstone, 


The rowe of 


Mo tee agit = cit., Chapter XI. 
G' ere proportional to the rows o » or the correlations of the variables 


with the arbitrary factors; consequently, extending G' 
differs from the one that would result from extending 3 


that is employed. 


yields a plot that 
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extended values are given at the bottom of Ta- 
ble I. Rows B, and C, of this matrix are then 
ysed to plot the fourteen variables. The result- 
ing diagram is given as Figure 1. 

A diagram of this type should suggest an in- 
itial hypothesis regarding the number of com- 
mon factors. Since the variables for this prob- 
jem do not form a single cluster nor all lie es- 
sentially on one straight line in the diagram, 
the minimum number of common factors should 
be at least three. If tnree common factors are 
adequate to reproduce the correlation matrix, 
with communalities in the diagonals, then it 
should be possible to enclose the plotted points 
ina triangle, the lines of which pass through or 
close to points representing variables. (Since 
the correlations in R are all positive, a posi- 
tive manifoid is expected.) If this were done 
for this problem, one vertex of the triangle 
would lie at 7, another at 10, and the third at 
the point marked z in Figure 1. This hypoth- 
esis regards variables 1, 2, 3, and 4 as con- 
stituting a cluster that lies close to the postu- 
lated third factor. Before acting on such an 
hypothesis, it is wise to examine the correla- 
tions of these variables with each other. From 
R, it is evident that both variables 1 and 2 have 
higher correlations with variables 5, 6, and 14 
than with variable 4. The correlation of vari- 
ables 1 and 4 is quite low (. 18), in comparison 
with other entries in those columns of R. In 
addition, variable 4 is most closely related to 
variables 3, 12, and 13. This evidence sug- 
gests that variables 1, 2, 3, and 4 may not be 
considered as a single cluster. If so, at least 
four common factors will be needed to repro- 
duce the matrix. 

A plane representation of a tetrahedron may 
be taken as the model for the representation in 
a diagram of this type of a four-factor pattern.8 
Four points joined by six straight lines repre- 
sent a tetrahedron on a plane. Such a figure 
provides a reasonably adequate fit for the var- 
iables of this problem as plotted in Figure 1. 
Variables 7, 10, 4, and 1 mark the cornersor 
vertices of the plane figure, and the remaining 
variables lie on or close to the six lines that 
may be drawn to join the pairs of these four 
points. An hypothesis of at least four common 
factors will therefore be taken as the initial 
hypothesis. 

In order to test this hypothesis of four fac- 
tors, the variables are regrouped into four 
groups, four factors extracted, and the fit of 
such a solution tested by computing the resid- 
uals. In planning the regrouping, variables 
that appear from Figure 1 to cluster near the 
assumed vertices of the representation of a 
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tetrahedron are grouped together, as follows: 
Group A,: 1, 2, 5 


Group B,: 3, 4, 12 
Group C,: 6, 7, 8, 13 
Group D,: 9, 10, 11, 14 


The correlation matrix is summed for these 
new groups to yield matrix G} of Table Il. A- 
gain, each entry in G} represents the sum of 
the correlations of the variable in that column 
with those variables making up the group des- 
ignated by the row. The trial communality es- 
timates are retained. 

Matrix H, is calculated from G}. It repre- 
sents the sum of the correlations of the varia- 
bles in each group with those in each of the 
groups, and should be symmetric. The second 
entry in the third row of H,, for example, is 
the sum of 1.55, 1.25, and 1.89, or the entries 
for variables 3, 4, and 12 in the third row of 
G}. The inverse of H, is then secured.9 This 
matrix is also given in Table II. The matrix 
multiplication G, H,~!G} then gives a repro- 
duced correlation matrix that may be subtract- 
ed from the original correlation matrix to yield 
the residuals. In performing this matrix mul- 
tiplication, certain checks on the calculation 
are available. The multiplication G,H,~! 
yields, for this problem, a matrix of 14 rows 
and 4 columns. If each column of this matrix 
is summed for those variables making up the 
four groups, the result should be the identity 
matrix, I, if the work has been accurate. Fur- 
thermore, the reproduced correlation matrix, 
when summed by columns in the same fashion, 
should yield the matrix G} if the multiplication 
is notin error. These checks are of consid- 
erable practical value. 

The matrix of reproduced correlation val- 
ues is given in Table II]. These values are 
then subtracted from the corresponding values 
in the original correlation matrix to give a set 
of residuals. These residuals, which appear 
in italics above the principal diagonal of Table 
Ill, are quite small. Consequently, the hypoth- 
esis of four common factors appears to be a 
satisfactory one for the given correlation ma- 
trix. The elements in the principal diagonal 
of Table III are the calculated communalities 
for four common factors. Comparison of 
these with the trial estimates of Table I shows 
discrepancies, especially for variables 13 and 
14. 

The specification of four common factors 
and the determination of new communality esti- 
mates permit the calculation of the initial ob- 
lique solution. The necessary adjustments in 
communality estimates from those given in 





8, See Harrie, "Projections of Three Types of Factor Pattern," op. cit. 


9. See the texte by Holzinger and Harman, or Thurstone, for the calculation 


of an inverse, and for rules of matrix multiplication. 
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Table I as the trial estimates are presented in 
the first row of Table IV. These values repre- 
sent those of Table I subtracted from the cor- 
responding diagonal elements of Table III]. Ma- 
trix Gj is then written, as shown in Table IV 
Itis merely matrix G} with the adjustments in 
communality estimates incorporated, since no 
regrouping was necessary. Matrix H, is also 
given; it differs from matrix H, only in its di- 
agonal elements, since only the communality 
estimates were modified to secure G}. The 
inverse Of H, is also given in Table IV. Since 
this initial oblique solution is to be rotated by 
the method outlined in the next section, it is 

not necessary to reduce Gj} to a structure ma- 
trixor H, tothe intercorrelations of the oblique 
axes; this may be done, however, if one wishes. 10 
Also, since this rotation is to be made, it is 
not necessary to calculate the pattern for this 
initial solution, though this too may be done if 
it seems desirable. 





Determining the new coordinate system: 





In developing the initial oblique solution it 
was necessary to advance an hypothesis regard- 
ing the number of common factors. This hy- 
pothesis of four factors was derived from an 
examination of the extended-vectors represen- 
tation of the variables and was supported by the 
small magnitude of the residual correlations, 
after the extraction of four common factors. 

In choosing a new coordinate system-—i. e. , in 
deciding where to rotate the initial solution— 
the extended-vectors representation of the var- 
iables will again be utilized. The extended Gj 
values are given in Table V. These are calcu- 
lated as before, by dividing each column of G} 
by the entry in the first row. Figures 2a, 2b, 
and 2c result from plotting on orthogonal graph 
paper rows B, with C,, B; with D,;, andC, 

with D,. The four-space hypertetrahedron may 
be taken as a geometric model for the four- 
factor problem. Extending the vectors projects 
this four-space hypertetrahedron into a three- 
space tetrahedron. There will then be three 
distinctive representations on a plane of this 
tetrahedron, and each will consist of four points, 
the pairs of which are joined by six straight 
lines. If this configuration is consistent with 
the positions of points representing variables 

in the three plots, then only a single rotation 
will be necessary to secure the primary solu- 
tion. 

Examination of Figures 2a, 2b, and 2c indi- 
cates that the extensions of variables 1, 4, 7, 
and 10 may be taken as the four points referred 
to above, and that each of the remaining vari- 
ables lies on or close to one of the six straight 
lines that might be drawn to join pairs of these 
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points. Further, the arrangement of points rep- 
resenting variables has a necessary consistency 
from diagram to diagram. For example, vari- 
ables 5 and 6 consistently lie close to the 
straight line that would connect the vertices 
designated by variables 1 and 7 in the three 
figures. Similarly, variables 8 and 9 consis- 
tently lie on a line between variables 7 and 10, 
and variable 14 consistently lies on a line be- 
tween variables 1 and 10. These diagrams 
therefore suggest that if the primary axes are 
taken as colinear with variables 1, 4, 7, and 
10, the rotated oblique solution will yield a 
pattern that may be represented schematically 
as: 


Variable Pj = Fe Pe 
1 + 0 0 0 
2 + + 0 0 
3 + + 0 0 
4 0 + 0 0 
5 + 0 + 0 
6 + 0 + 0 
7 0 0 + 0 
8 0 0 + + 
g 0 0 + + 

10 0 0 0 + 
11 0 + 0 + 
12 0 + 0 + 
13 0 + + 0 
14 + 0 0 + 


where + designates a substantial positive coef- 
ficient. This interpretation of the diagramsis 
taken as the guide to rotation. 


Rotation of the initial solution: 





The rotation of one oblique solution to an- 
other may be accomplished in two ways. One 
method is to have or make available an ortho- 
gonal factor matrix as a reference matrix, and 
to calculate the direction ratios that rotate the 
given oblique solution into the new one by ref- 
erence to the direction cosines that rotate the 
orthogonal solution into the given oblique one. 11 
Another method is to develop the desired ratios 
directly from the data of the given oblique so- 
lution. This latter method is illustrated here. 

The plan for this rotation has been discuss- 
ed above. It is to take the four primary axes 
as colinear with variables 1, 4, 7, and 10. 
These four respective columns of matrix G} 
give values that are proportional to the correl- 
ation of each of these variables with the four 
oblique axes of the initial solution. These en- 
tries are copied as the rows of matrix C, which 
appears in Table Vi. Next, matrix Y is con- 
structed by post-multiplying matrix C by the in- 





10. See Appendix, equations (5) and (6). 


ll, See Thurstone, Multiple-Factor Analysis, op. 


cit., for illustrations. 
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yerse Of H,, as is indicated in Table VI. Just 
as the rows of matrix C are proportional to the 
correlations of each of these four variables 
with the original oblique axes, so the rows of 

y are proportional to the coordinates of these 
four variables with respect to the original ob- 
lique axes. It is generally true that if we have 
available two square matrices, one consisting 
of the correlations of a set of vectors with an 
original set of unit-length axes, and the other 
consisting of the coordinates of the set of vec- 
tors with respect to the original axes, then 
post-multiplying either of these matrices by 
the transpose of the other will yield a square 
symmetric matrix that is an elementary trans- 
formation of the intercorrelations of the set of 
vectors when taken at unit length. It is shown 
in the appendix that this also holds for matrices 
of the form C and Y used here; consequently, 
the post-multiplication of Y by the transpose 
of C is performed, as shown in Table VI, to 
yield matrix Z. 

This matrix Z may be described as the ma- 
trix of intercorrelations of the new primary 
axes, when they are taken as having the lengths 
of the variables 1, 4, 7, and 10 in the common- 
factor space. The next step, therefore, is to 
give, arbitrarily, these new axes unit length. 
The diagonal entries of matrix L are the recip- 
rocals of the square-roots of the diagonal en- 
tries of Z. By pre- and post-multiplying Z by 
L, Z is transformed into the intercorrelations 
of the primary axes when taken at unit length. 
This is shown in two steps in Table VI, with 
op designating the intercorrelations of the pri- 
mary axes. 

Matrix A, which will rotate matrix G, into 
the primary structure, Sp, can now be written. 
It is equal to Y'L, as shown in Table VI. The 
principle involved is that the coordinates of a 
point when divided by the distance from the or- 
igin to that point give the direction ratios of the 
vector joining the origin and that point, withre- 
spect to the axes of the given coordinate sys- 
tem. (If the given coordinate system had been 
orthogonal, the direction ratios would, of 
course, have the more familiar form of direc- 
tion cosines.) Here, the rows of matrix Y are 
proportional to coordinates and consequently 
the columns of matrix A are proportiona: w the 
direction ratios of the primary axes. It is 
shown in the appendix that each column of Ahas 
heen decreased from the ‘‘true’’ values of the 
direction ratios by a factor equal to that by 
which the corresponding column of G; is in- 
creased over the actual correlations of the var- 
iables with the original oblique axis. It there- 
fore follows that post-multiplying G, by A gives 
the correlations of the variables with the pri- 
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mary axes taken at unit length. 
Table VI also includes the inverse of the in- 

tercorrelations of the primary axes, 0,~ 

This inverse is utilized to yield the primary 

pattern, by means of the equation: 


-1 
Sp, * = Py 


The primary structure, Sp, which gives the 
correlations of the variables with each of the 
primary axes; and the primary pattern, Pp, 
which gives the coordinates of the variables 
with respect to the primary axes, are given in 
Table VII. The calculated primary pattern is 
consistent with the schematic representation 
that was deduced from the diagrams above, and 
it reproduces, within rounding errors, the hy- 
pothetical pattern. 


(1 a) 


Summary: 


The major purpose of this paper has been to 
suggest that an initial oblique solution, suchas 
that given by the multiple-group method, may 
be rotated directly to the primary solution with- 
out the necessity of calculating an intermediate 
orthogonal solution. The first problem is that 
of securing an initial oblique solution for which 
the number of common factors and the commun- 
alities are adequate to reproduce the given cor- 
relation matrix within the limits that are to be 
accepted. This problem, of course, is com- 
mon to all initial solutions, orthogonal and ob- 
lique, in which only the common-factor vari- 
ance of the variables is to be analyzed. 

Once the initial oblique solution has been se- 
cured, it may be rotated without the necessity 
of first transforming it to an orthogonal solu- 
tion. This rotation might be done in one of 
several modes. 12 The mode illustrated here 
is that of a single rotation to the primary solu- 
tion by choosing the primary axes in terms of 
the geometric model of the hypertetrahedron. 
For a problem that exhibits what Thurstone 
calls a ‘‘compelling simple structure’’ this 
method of rotation is an especially economical 
one. The procedure, as illustrated in the paper, 
is that of locating from the extended-vector rep- 
resentation of the variables the points repre- 
senting the extensions of the primary axes. 

Once these points have been located, their 
coordinates with respect to the initial oblique 
axes may be calculated and then transformed 
into direction ratios, which in effect give the 
primary axes unit length. This matrix of di- 
rection ratios transforms the initial structure 
into the primary structure. The primary pat- 
tern, the columns of which are proportional to 
the columns of the familiar V matrix used by 









from the discussion in the Appendix. 


12. For example, the vrinciple of Thurstone's single-plane method of rotation 
may be employed by rotating a column of the pattern of the initial oblique 
solution with respect to the several columne of th 
necessary modifications in calculation vrocedure can readily be inferred 


the oblique structure. The 
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TABLE Vil 
ROTATED OBLIQUE SOLUTION 





Variable 


Pattern 
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Thurstone, may then be calculated from the 
available data. The primary structure, the 
primary pattern, and the intercorrelations of 
the primary axes give a complete description 
of the rotated oblique solution. 


APPENDIX 


Attention should be called to an article by 
Guttman, antedating both Holzinger’s and Thur- 
stone’s descriptions of the multiple-group meth- 
od, that provides a general matrix formulation 
of this method and suggests ‘‘methods for ex- 
tracting several factors at a time, be they ob- 
lique or orthogonal. ’’! Guttman develops an 
equation of the form: 


RX' (XRX')~! XR = R*, (1) 


where R is the given correlation matrix and R*+ 
is the reproduced correlation matrix.2 He 
proves that if r is the rank of R and s the rank 
of X(r>s), the residual matrix, R - R*, hasa 
rank of r - s and is Gramian; he also shows 
that the solution is indeterminate. 3 Guttman 
then points out the nature of the X matrix that 
is employed in the centroid and the principal- 
axes methods of factoring. A matrix formula- 
tion of the multiple-group method will extend 
Guttman’s work by defining explicitly the nature 
of his X matrix for this method. 

Assume R, of order n by n, with commun- 
alities in the diagonals that give it rank m(m<n). 
Construct a matrix, E, of order m by n and 
rank m, whose elements are either 1 or 0, that 
may be used to sum R by sections. For exam- 
ple, in the illustrative problem given in the 
body of the paper, this matrix was, in effect, 
constructed to secure the four-factor initial 
solution: 


| 1 
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by the row; all other elements are 0. Ordin- 
arily the non-zero entries are positive. As 
Guttman points out, a negative entry reflects 
the variable; he also suggests that the use of 
weights other than unity or zero has interesting 
possibilities. 

A matrix formulation of the operation of sum- 
ming the sections of R in the multiple-group 
method is then given by: 


RE'=G (2) 
and ER =G' (3) 


Let ERE' = H, (4) 
where H is a square, symmetric matrix, of 
order m by m and of maximum rank. In either 
Holzinger’s or Thurstone’s employment of the 
multiple-group method, the condition that H be 
of maximum rank is mandatory. H also must 
be such that when pre- and post-multiplied by 
the diagonal matrix formed from the square- 
roots of the reciprocals of its diagonal entries, 
it yields a correlation matrix, , with unity in 
the diagonals. 

An expression for the diagonal matrix that 
transforms H into o is needed. Note that G is 
a matrix of order n by m, where the rows des- 
ignate the variables and the columns the groups 
of variables specified by E. Select for a new 
matrix, J, of order n by m, elements of zero 
and those elements of G which give the sums of 
the correlations of each variable with the group 
to which it belongs. 4 Then EJ gives the diag- 
onal elements of H, and (VEJ)~! gives the de- 
sired diagonal matrix. Thus: 


(VEX)“! H (VED)! = >. (5) 


The matrix @ gives the intercorrelations of the 
oblique factors extracted by this method when 


7 8 9 10 11 12 13 14 





Group A; | 1 


Group B, | 0 
GroupC,; | 0 0 0 9 0 
Group D,; | 0 0 0 0 0 


Here each row of E has 1 as the element for 


each variable that makes up the group designated 


0 0 0 0 0 
0 0 0 

1 1 0 0 

0 0 1 1 


the factors are taken at unit length. 
The correlation of each variable with each of 





- Louis Guttman. 
metriks, IX (March 1944), mp. 1-16. 


Ibid., p. 12. 


- Ibid., op. 12-13. 


"General Theory and Methods for Matric Factoring," Psycho- 


His notation hae been modified. 


. It is of interest to note that G - J is the tyne of matriz Holzinger uses 
to calculete the general factor coefficients by the bi-factor method. 
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these oblique factors is given by: 


G(VEJ)-“1 =s (6) 
and (VEJ)“!G' =s' (7) 


For any oblique solution: 
so-! s' = Rt. (8) 


By employing the expressions in (2) through (8) 
we may write: 


RE'(VE3)“1[ (VE3)~1 ERE'(VES)~1] -1(VED)-1 
ER = R*, (9) 


which simplifies to: 
RE'(ERE')~! ER = R*. (10) 


Equation (10) is of the form of (1), with E sub- 
stituted for the matrix X. This reduction 
therefore demonstrates that the multiple-group 
method of factor analysis is included in Gutt- 
man’s general formulation of matric factoring. 
It also demonstrates that the initial oblique so- 
lution obtained by the multiple-group method 
is completely determined, with R fixed, by the 
choice of E—that is, by the grouping of the 
variables. 

In the calculation of the initial oblique sol- 
ution by the multiple-group method it is possi- 
ble to use matrices G and H~-! to reproduce R 
and thus test the fit of the postulated number 
of factors. By substituting (2), (3), (4) in (10) 
we secure: 


G H7-!G' = Rt. (11) 


The use of (11) in preference to (8) is suggest- 
ed by convenient checks on the accuracy of the 
multiplication. First, from (2) and (4) it is 
apparent that: 


EGH™! = 1. (12) 


Summing the columns of GH™! for the groups 
of variables therefore should yield the identity 
matrix if the work has been accurate. Second, 
from (12) it follows that: 


EGH™!G' G' (13) 
or ER* G' (14) 


Summing the sections of the reproduced correl- 
ation matrix therefore yields G' if the second 
multiplication has been performed correctly. 
These checks are of considerable practical 
value. In addition, comparison of (14) and (3) 
shows that the sections of the residual matrix, 
R - R*, must sum to zero. 

If S designates the matrix of correlations of 
the variables with the oblique factors, then the 
matrix of coordinates of the variables with re- 
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spect to these factors may be designated P, thus: 
so = Pp (15) 

Combining (6) and the inverse of (5), 
GH"!VEJ = P. (16) 


If the fit, as tested by (11), is adequate, thep 
the complete oblique solution may be calculated 
from (6), (5), and (16). 

If the fit is not adequate, then it may be de- 
sirable to extract additional factors. This might 
be done by beginning again with an augmented g 


matrix and thus discarding the work already done, 


or by factoring the residual matrix and augment- 


ing the G and H that have already been calculate, 


Since factors extracted from the residual ma- 

trix by this method will be orthogonal to those 

factors previously extracted from R, thoughin- 
tercorrelated among themselves, the complete 
matrices G and H, and consequently the result- 
ing structure and pattern matrices, can readily 
be written. 


Rotation from one set of oblique axes to a sec- 
ond set: 





Given an initial oblique solution, 8, >, and 
P, that reproduces R satisfactorily, it may be 
desired to rotate this solution to a second ob- 
lique solution, Sp, Op, and P» Assume a ma- 
trix A, such that: 


SA = Sp (17) 


Also assume an orthogonal factor matrix, F, 
and a matrix of direction cosines, T, such 
that: 


FT s (18) 


Then: FTA = Sp. (19) 


Therefore TA may be taken as a matrix of dir- 
ection cosines that transforms F into Con- 
sequently, by definition of the scalar products 

of a set of unit length vectors: 


(TA)' (TA) $e (20) 
or, A'T'TA p- (21) 


But T'T equals ®, since T is the matrix of dir- 
ection cosines that satisfies (18), and so by 
substitution: 


A'OA = Op (22) 


If Acan be written, then Sp will be given by (17), 
Op by (22), and Pp by: 


-l - 
Spo, = Py» (23) 
which is analogous to (13). 
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Since the initial set of axes is oblique, ma- 
trix A must be the matrix of direction ratios, 
rather than direction cosines, of the new set of 
axes with respect to the old. The direction ra- 
tios of a vector consist of its coordinates with 
respect to the given reference axes, each di- 
vided by the length of the vector. Since the co- 
ordinates of the vector are proportional to its 
direction ratios, they may be called its direc- 
tion numbers. In order to calculate A, define 
new matrices as follows: Let matrix C be a 
matrix of correlations of the new axes with the 
given ones, with the provision that these new 
axes may be given, initially, lengths other than 
wity. Define matrix Y as the matrix of coord- 
inates of these new axes with respect to the 
given ones, or the matrix of direction numbers. 


Then: 
col = y. (24) 


Since Y is a pattern matrix, or a matrix of 
coordinates, we may write: 


YOY' = Z, (25) 


where Z is a symmetric, square matrix that 
gives the intercorrelations of these new axes 
when they are taken at whatever length was de- 
termined by the entries inC. The diagonal 
elements of Z give the squares of these lengths. 
Let a diagonal matrix L have as elements the 
reciprocals of the square-roots of the diagonal 
elements of Z. Then: 


LY A' (26) 
and Y'L ». (27) 


This formulation of the rotation procedure is 
general. It will be noted that for the special 
case of rotating from an initial orthogonal so- 
lution, matrices C and Y are identical and 
is the identity matrix. The operation of ‘‘nor- 
malizing’’ direction numbers to turn them into 
direction cosines is then represented by equa- 
tion (27). 

Ordinarily, matrix A will be a square ma- 
trix of maximum rank and consequently non- 
singular. By employing (23) and substituting 
(17) and the inverse of (22), we may write: 
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SO-1(a')"l = Pp, (28) 
or P (A')"l= Pp. (29) 


Equations (17) and (29) make explicit the rela- 
tionship between the two transformation ma- 
trices that rotate a structure and its corres- 
ponding pattern into the desired structure and 
pattern. One is the inverse of the transpose of 
the other. 

In employing the multiple-group method to 
calculate an initial oblique solution that is to be 
rotated, it is convenient to work with matrices 
G and H™! of the initial solution rather than cal- 
culating S and @-1. In order to make the rota- 
tion, write the matrix C,, rather than C, where: 


CVEI = C,. (30) 
This can be written directly from the entries in 
G providing the new axes can be specified as 
linear combinations of the variables. Then: 


c,H7} (31) 


Y, 
and Y,C} CVEJ H7!VEIJC'. (32) 


From (5) and (24) it is apparent that: 


Y, Ci = YC' = Z. (33) 


Since this procedure yields the same Z, it also 
yields the same L and, by the operation LZL, 
the same bp. Let: 


YiL=A, (34) 
Then A, = H“1VEJC'L = (VEJ)"1 A. (35) 
Consequently: GA, = SA = Sp, (36) 
which follows from (6), (35), and (17). 


Again, checks on the calculation can be de- 
veloped. It can be shown, for example, that: 


ESp = CiL (37) 
and EPp = CiLo,"! (38) 
Equation (37) is a recombination of (30), (35), 


and (36) plus a pre-multiplication by E. Equa- 
tion (38) follows from the definition of (23). 








SOME EMPIRICAL ASPECTS OF THE SE- 
QUENTIAL ANALYSIS TECHNIQUE AS 


APPLIED TO AN ACHIEVEMENT 
EXAMINATION 


WILLIAM J. MOONAN 
University of Minnesota 


Introduction periment contains different essentials and ob- 
jectives. 

With the presentation of the early paper! on The basic characteristics of the sequential 
sequential analysis by Dr. Wald, a new frontier method consist of dividing the total sample space 
in modern Statistics was achieved. Prior to by two lines, d, and d,, into three mutually ex-. 
those writings it had been shown by Professor clusive zones on the basis of making three pos- 
rR. A. Fisher that the proper place for determ- sible decisions: 
ining the size of sample needed in an experiment 1. The lot (or student) is acceptable. 
was in the planning of the experiment itself. 2. There is insufficient evidence to warrant 
That is, before the experiment was executed. a decision of acceptability or unacceptabil- 
It is well known that previous to Fisher’s work, ity of the lot. 
most thought about the proper sample size took 3. The lot is unacceptable. 
place after the experiment was run. We note to- 
day the metamorphosis of the determination of If a particular lot happens to fall in zone (2), the 
sample size, for with Wald’s writing we seethat | sampling procedure is continued until a decision 
under certain circumstances ‘‘n’’ is no longer of acceptability or unacceptability is reached by 
to be regarded as a fixed quantity but as a ran- the sampling of addtional items from the lot sup- 
dom variable, and that the designation of the ply. 
proper sample size is made by the experiment Figure 1 is a graphical representation of the 
itself. It is in the sense that the statistical cog- | sample space. The x axis is usually denoted as 
itation about ‘‘n’’ becomes a part of the experi- | the number of items examined and the y axis is 
ment rather than a prelude or post-mortem of the number of defects observed among these 
it, that the marvel of the sequential procedure items. 
is noted. The slope and intercepts of the boundary 

The use of sequential analysis in educational lines, d, and d,, are determined by the quality 
problems has been undertaken at least by two tolerance limits and the preassigned risks. 
authors.2,3 Dr. H. M. Walker has made a con- | Particular points determined by the number of 
tribution in the field of item analysis, and Dr. items inspected and the number of defects ob- 
Cowden used sequential analysis in administer- served are plotted. As long as these points re- 
ing an examination. main within the area limited by d, and d, in- 
spection of items of a particular lot is contin- 
The Sequential Procedure ued. Inspection terminates when a particular 
point falls in the area above d, or below d,. 

The main purposes of this paper are to dem- Consider more intimately the details of the 
onstrate the applicability of the sequential tech- | sequential procedure simultaneously with a nu- 
nique to the scoring of achievement examinations | merical example so that its processes may be 
and to present some empirical results of an ex- | clear. Let our problem be to decide whether a 
periment of such an application. The first dis- student should be passed or failed according to 
cussion may be regarded as a more detailed dis- | his performance on an objective examination. 
cussion of Dr. Cowden’s paper although the ex- Now ordinarily, we would set some proportion 
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of wrong answers less than which he would have 
to attain in order to pass. That is, his unknown 
proportion, Pp, of failure should be compared 
with this specified proportion and his disposition 
arranged accordingly. Practically we can con- 
sider two valueS, Po and p, such that if p2p, 
then we would be passing an undeserved individ- 
yal, and if p2py we would be failing a more de- 
serving student. 

Also consider the risks of making a wrong 
decision about a student. Let us not be extreme- 
ly strict, but allow « to be the probability of fail- 
ing a student whenever p<ppo, and @ to be the 
probability of passing a student whenever p2p,. 
For a numerical —- take p, = 45/75, po = 
35/75, and «=6= We are then sampling 
with a probability a one in ten of passing a stu- 
dent who makes more than a 45/75 proportion of 
mistakes, and sampling with that same probabil- 
ity of failing a student who makes less than a 
35/75 proportion of errors. The quantities p,, 
Po, “, and @ define our sequential plan and are 
called quality tolerance limits and preassigned 
risks. 

In general, the sequential method says to con- 
sider a likelihood function, Lp, which consists 
of the ratio of the probability of obtaining a sam- 
ple under two hypotheses. Our hypotheses are: 
Ho : P =Po and H, : p = p, where p is the prob- 
ability that a random variable, say x, which 
can take only two values, 1 or o, equals one. 

In this case we have an incorrect response. 

For a correct response, x = 0, and the proba- 
bility of this is 1 - p. Therefore, we have de- 
fined the distribution of x for two values, i.e. 
{{l,p) = p and f(o,p)=1-p. The likelihood 
equation is written: 


° {(Xp, P,) 
- {(Xp, Po) 


L = {(x,,p,) {(x2,p,). . 
{(x,, Po) f(x2,Po)- - 





Itis usually more convenient to work with the 
logarithm of the likelihood function because the 
calculations reduce to summations rather than 
obtaining products. 
Define z = log {(x For our particular 
X, Po) 
case then, z = log (p,) if the student answers 
(Po) 
the question wrong, and z = log (1 - p,) if he 
(1 - Po) 
answers correctly. Therefore, 


2, log yr (1 - p,)| 14, i= 1,2, 
(Po) oy - =a) 
»n and log Lp becomes 
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xj log (Py) + (1 - xj) 
1 (Po) 


é = prs ws - 
(Po) 


= xj +n log (1 -p,). 
(1 - Po) 


n 
i=1 i ‘ie . 


log (1 ~ Pi) 


4 (1 - Po) 


log (1 =o], 
(1 - Po) 


Here : x; is the total number of incorrect 
oman n questions, and n is to be consid- 
ered as a variable. 

Wald’s theory tells us that if, in the course 
of sampling, z zi ever exceeds or is equal 
to a certain a log A, where A= 1-8 
then the sampling process ceases and a shall 
However, if : zj is ever less than 
where .- 


accept H,. 


FP . then the 
l-a@ 
sampling process ceases and we shall accept 
n 
Ho. If2 2; takes values between these two 
i=1 
extremes, the sampling process continues until 
a decision is reached or until some practical 
truncation point is attained. Then for our nu- 
merical example, 


or equal to log B, 


(7) (4) 
We are especially interes- 
n 


(9) i=1 


n log (3) < log (9). 
(4) 


ted in simultaneous values of [ x; and n which 
i=1 
will make the center expression equal or exceed 
log (9) and equal or fall below log (1). 
(9) 
By considering only the equalities we can 


n n 
log (1) <%= fice to - toe ish] 


find the criterion points for decision by setting 


n 
= xj equal to d, for the right hand member and 
i=1 

equal to d, for the left. After some simplifica- 
tion and transposition we find we have two linear 
equations:4 d, = .534n + 4.08 and d, = . 534n - 
4.08. To see how the sampling process oper- 





4, Strictly speaking, we have s set of diecontinuous points. 
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Figure 2. Illustrative Graph of the Sampling of 
a Student for the Sequential Test 
Po = 45/75, p, = 35/75, x= 6=.1 





FORMULAE AND CALCULATIONS FOR ASN CURVE 


TABLE I 





For Students 
of Ability 


Formula for 


Expected Value of n 





E(n) = 


E(n) = 


log (B) 
log(1 - p,) 


(1 - Po) 


(1 - @log(B) + log (A) 
Polog(p,) + (1 - Po)log(1 - p,) 
(Po) | ee Po) 





-log(A)log(B 


. og(A)log(B) 
log (p,) log (1 - pa) 


(po) (1 - Ps) 


log(B) + (1 - B)log(A) 





Palog{es) + (1 - p, log(1 - p,) 
(Po (1 - Po) 


= log(A) _ 
log(p,) 
(Po) 
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Figure 3. Average Sample Number Curve for 
Sequential Test 
P, = 35/75, p, = 45/75, a=68=.1 
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ates, consider Figure 2. 

The two lines d, and d, are drawn by the 
slopes and intercepts of their respective equa- 
tions. Sampling a student diagrammatically 
consists of always moving to the right from the 
zero point, but drawing the line horizontally if 
he gets the item correct or at 45° if he errs. 

The case in Figure 2 shows that the student was 
passed under the conditions of Ho, and that his 
examination terminated with 19 questions asked. 

Rather than use the graphic method of sequen- 
tial sampling, one can solve for the values ofd, 
and d, for various values of n and record them 
in a table and compare them with the number of 
errors made by an individual. 

It would be of value to know how many ques- 
tions on the average would be required to be ad- 
ministered to a student who has a certain pro- 
ficiency. Generation of the average sample 
numbers for all levels of ability for a certain 
sequential plan would produce a curve which is 
known aS the average sample number curve, or 
briefly, as the ASN. The magnitude of E(n) 
will of course depend on po, p, oa and B®. ASN’s 
for five types of students are easily found by 
the formulae in Table I. Additional points may 
be found, but require more calculation. A di- 
agram of the ASN curve for the particular se- 
quential test under consideration is shown in 
Figure 3. The maximum value of E(n) occurs 
in the neighborhood of the ordinate which is cor- 
respondent to the abscissa, p = s. 9 

Any sequential testing procedure also has an 
operating characteristic function which is de- 
termined by the parameters of the test. This 
curve has the ability to show the probability of 
passing the examination under the specifications 
of the parameters of the sequential tests for 
any value of p. Four points of this curve are 
readily known since, if a student’s proficiency 
is of level p = 0, his probability of passing is 
certainly 1; if p= 1, Lp =1 - x= .9; andif p= 
Pi, Lp = @=.1. A fifth point may be easily 
found, for if p= s = .534, Lg =__logA 

log A - log B 
=.50. This curve is shown in Figure 4. De- 
tails of the derivation and calculation of the 
ASN and OC curves are given in references 2, 
4, and 5. 

We have now concluded an exposition of the 
technique of sequential analysis. It must be re- 
membered that the discussion provided is an 
illustration of only one problem which may be 
handled by the theory (see reference 4). Com- 
plete discussion of other methods are available 
in Wala’s text. The numerical adaptation of 
these processes has also been specific and is 
actually a part of the experiment which shall 
now be discussed more fully. 












The Experiment 





A critical person might well conjecture as to 
the reliability of an examination which is ad- 
ministered by sequential analysis. Of course, 
the most practical standard to which a sequen- 
tial score must be compared would be the score 
which an individual would receive if the com- 
plete examination were administered. It was the 
purpose of this experiment to estimate the con- 
sistency of scores for an achievement examina- 
tion administered sequentially and totally. 

To effect this end, an achievement examina- 
tion in descriptive statistics was given toa 
class of thirty-nine graduate students in the 
College of Education at the University of Min- 
nesota. The examination consisted of seventy- 
five 5-choice multiple choice questions on var- 
ious topics of descriptive statistics. These 
questions have been found to have a reliability 
coefficient of . 87 and are considered to be valid 
by the students and the administrators. 

At this point it may be advisable to point out 
some of the distinctions between Dr. Cowden’s 
experiment and this one. In the first place, our 
examination, while not as lengthy as his, cons 
sisted of the 5-choice multiple choice type of 
question rather than the alternate-response 
type which he used. True-false items can 
easily be negatively discriminating, because 
superior students usually are aware of eviden- 
tial exceptions of the statement. The absolute 
truth or falsity of a statement, too, is not al- 
ways apparent. Also, the probability is . 5 that 
a student answers a question correctly even 
when he knows nothing about it. Consequently, 
some corrections for chance success should be 
made in order to increase validity. For the 
above reasons, alternate response items were 
not considered the best to use in this experi- 
ment, and because the probability of chance suc- 
cess is thereby substantially reduced, no cor- 
rection for guessing was applied in scoring the 
test. 

Another difference occurs in the selection of 
the order of items. In this experiment the ques- 
tions were selected with the aid of a table of 
random numbers. No attempt at stratification 
was made. This was done so that the conditions 
of the sequential method could be more closely 
adhered to. 

The usually practical requirement of group- 
ing the questions into ‘‘rounds’’ was not done 
either. In this experiment it was not necessary 
to group the items because the test was not ad- 

ministered sequentially, but only corrected that 
way. 

After the test was administered, note was 
taken for each pupil whether he missed one of 

















5. Here “e”" is taken to mean the elope of the decision lines, i.e., p2e=.534. 


5. R. &. Fisher and F. Yates. Statistical Tables for Biological ricultural, 
and Medical Research, Table XXXIII (London: Oliver and Boyd, tae 1949). 
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Figure 4. Operating Characteristic Curve for 
Sequential Test 
Po = 35/75, p, = 45/75, a=6=.1 











Figure 5. Scatter Diagram for R, and S, 
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Figure 6. Scatter Diagram for R, and R, 
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the seventy-five questions or whether he passed 
it, These data were laid out in a form similar 
to an item analysis sheet. Past experience had 
given sufficient evidence in regard to what 

should be considered passing and failing levels 
in terms of the percent of items missed for the 
seventy-five questions. Next, consideration was 
given to the preassigned risks. Attempt was 
made to vary these constants somewhat, but 
still keep in mind the practicalities of the cir- 
cumstances. Of all the possibilities of combin- 
ations of these constants, five sequential tests 
were finally selected. The results of these se- 
lections are shown in Table II. We shall refer 
to these five tests as Sj, that is S;, Sg, S3, S4, 
and S5. 

The equations, d, = sn + h, andd, = sn-h,, 
were solved for each test value of n from 1 to 
15. As mentioned previously, a random se- 
quence of the test items was effected and for 
each sequential test, all the student’s answers 
were sequentially scored. 

Under ordinary sampling inspection, if an 
article under examination did not meet the spec- 
ification imposed by Hop, it would be rejectedas 
faulty. We are altering the method somewhat 

by not absolutely failing a student, or for that 
matter, absolutely passing him. We shall qual- 
ify this dichotomization by calculating his per- 
cent of correct response for each sequential 
test. For instance, suppose that Student 1 was 
being sampled according to the conditions im- 
posed by Sj, and that, after 38 questions were 
considered he had made only 16 mistakes. 
Thirty-eight questions were the minimum num- 
ber on which we could make a decision about 
him under either hypothesis. In this case, he 
was accepted under Ho:p = Po. His percent of 
correct response is found by forming the ratio 








38 - 16 = 57.89. Such ratios were found for 

38 
each student for each of the sequential tests. 
Having calculated each student’s percent of cor- 
rect responses on the whole 75 questions, which 
we shall term generally as R;, we can find with 
them the product moment correlation coeffic- 
ients between Rj anc each Sj. As it happened, 
the ASN for each Sj equalled about 40. To have 
another basis of comparison and control, we 
introduced a test termed Rg, which represents 
the percent of correct response for each student 
based on exactly 40 items. These correlations 
were all of magnitude . 90 with the highest being 
R1R2 = .93 and the lowest, RjSs5 = . 87. 

Scatter diagrams showing the relationship be- 
tween Rj - Sg and Ry - Rg are given in Figures 
Sand 6. The test Sg was selected because it 
gives the most respresentative diagram for all 
the RSj relationships. Figure 5, upon close ex- 
amination can be seen to be made of three parts: 
Part I: Points consisting of low scores for 

R, and So. 
Part II: Points consisting of high linear re- 
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lationship between R, and S9. 
Part II: Points consisting of high scores 
for Rj and So. 


Those points in Part I represent students 


who were rejected under Ho early in the sequen- 
tial process and have low percent correct re- 
sponse ratios.- However, Part III represents 
students who were accepted by Hoy early in se- 
quential process and have high percent correct 
response ratios. 
Part II is not as apparent in Figure 6. This is 
evidence that the sequential scoring method is 
accomplishing a discriminatory function. 


The segregation of Part I and 


Some additional information might be gained 


by comparing the means of R,, Rg, and Sj. 

These significance tests may be most exped- 
itiously carried out with the analysis of vari- 
ance. 
Table II. 


A summary of this analysis is given in 


Summar 


This experiment consisted of four different 
aspects. The first of which was the adminis- 
tration of a reliable and valid achievement ex- 
amination in descriptive statistics to thirty- 
nine graduate students. Part two consisted of 
the setting up of five sequential tests which 
could be practically applied to the achievement 
examination in o rder to test a student accord- 
ing to two hypotheses about his ability. The 


| purpose was to sample the student under the 


restrictions imposed by the sequential test and 
accept him under one of the hypotheses as soon 
as a Statistical conclusion could be made. In 
the third part we corrected the examination for 
each student under each sequential test. The 
last part concerned the analysis of the data. 
This analysis is shown in various tables and 
figures. Armed with that information and the 
experience gained from conducting this exper- 
iment, a few conclusions will be given. 

1. Sequential analysis has its greatest util- 
ity in sampling data which comes serially and 
where the expense of sampling is a function of 
the number of items sampled. Some education- 
al data may be otained serially if so desired, 
but usually the expenses involved in sampling 
are more or less independent of the number of 
items involved. This is especially true of data 
secured from achievement examinations. 

2. The sequential technique could be used in 
practical cases of achievement testing when it 
is desired to spot quickly those students who 
have high or low ability in some field. Such 
cases would occur occasionally in ordinary 
classroom procedure when it is desired to have 
students with high ability demonstrate their 
facility early and spend time ordinarily spent 
on the examination, on more profitable exper- 
ience. Students with low ability could spend 
their extra time to gain further understanding 
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THE QUALITY TOLERANCE LIMITS, PREASSIGNED RISKS, SLOPE AND INTERCEPTS 


FOR SEQUENTIAL TESTS USED IN THIS EXPERIMENT 









































Si Po Ps a é s h, “hy 
81 35/75 45/75 .10 .10 . 534 4. 08 4. 08 
S2 25/75 45/75 . 02 . 02 . 465 3.54 3.54 
83 25/75 40/75 . 05 . 05 . 431 3. 56 3. 56 
S4 30/75 40/75 .10 .10 . 466 4. 08 4. 08 
85 35/75 45/75 .10 .20 . 534 3. 86 2.79 
TABLE 
THE ANALYSIS OF VARIANCE 
Source of 
Variation df ss ms F P 
Between Students 38 66428. 858 1741. 28 60. 61 P .Ol 
Between Scoring 
Methods 6 1037. 039 172. 84 6. 02 P .01 
Error 228 6550. 324 28. 73 
TOTAL 272 | ~=(74016. 221 
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or with remedial teaching. Examinations 
scored sequentially could be used efficiently in 
the field of counseling when it would be desired 
to ascertain a variety of abilities in the most 
efficient manner. Another adaptation could be 
made in the field of intelligence testing. This 
technique would be valuable in testing those 
special cases where the examiner deals with a 
potentially uncooperative individual or those of 
low or high intelligence. Of course, a large 
amount of work would be required by test makers 
in order to validate this procedure. 

3. Figures 5 and 6 indicate that the sequen- 
tial ests are accomplishing a ¢cifferential effect 
between students of high and )ow ability. 

4. Product moment correlations were cal- 
culated. These values are homogeneous for 
each Sj and within themselves, and average 
around . 90 which indicate that a high degree of 
consistency between the criterion and §j. 

5. Tests of the homogeneity of the frequency 
distributions of Rog and Sj with Ry were found 
significant for all Sj and not significant for Ro, 
thus showing that the frequency distributions 
are not of the same type as the criterion distri- 
bution. 

6. The last empirical detail involved the an- 
alysis of variance. We note especially from 
Table III that there exists significance differ- 
ences between the means of the scoring meth- 
ods. Let us examine this significance more 
closely. After constructing a ‘‘student’s’’ t- 
test for the significance of the difference be- 
tween two means, we find that in order to have 
such a significant difference at the 5 percent 
level of probability, there must be a difference 
of 1.68 units between the means. The mean 
values for these tests are given now: 


Ry = 53.94 Re = 53. 40 

S1 = 57.05 S2 = 56. 39 

$3 = 55.92 $4 = 55. 71 
$5 = 59.77 


Thus we see that this difference of 1. 68 units 
exists between Rj, R2, and Sj and within the Sj 
themselves. Thus we see that although we get 
high correlations, the Sj tests define different 
populations than our criterion. Thus, for pur- 
poses of estimating the mean, this particular 
method is not effective. Nevertheless, we must 











remember that such was not our purpose; if it 
were, then another sequential method would be 
utilized. 

7. In addition to the above mentioned places 
for educational uses of sequential analysis, it 
may be of value for answering these questions: 


a. What is the optimum number of items 
needed to enable the greatest number of 
individuals to finish the test in a given 
time for given precision limits? 


b. Is it possible to discover any speed-power 
relationship for a test? 


c. What is the optimum number of items 
needed to differentiate between, say, the 
upper and lower thirds of group tested? 


8. As a consequence of these observations, 
it is apparent that the sequential technique may 
be used with achievement examinations with 
some reservations, and may hold some promise 
for solving other problems. 
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A study begun in September 1948 at Michi- 
gan State College was concerned with the rela- 
tionship between success in Chemistry 101, a 
freshman course requiring no high school chem- 
istry, and the ability to perform the fundamen- 
tal arithmetic operations and to read materials 
involving chemistry. All entering chemistry 
students were administered the ACE Psycholog- 
ical Examination and the Michigan State College 
Chemistry Pre-Test. At the completion of the 
course their grades were correlated with their 
scores on the two tests. The correlations are 
presented in Table I. 

These correlations show that success inthe 
course is related to the two abilities, i.e. to 
perform fundamental arithmetic operations and 
to read appropriate material. An attempt was 
then made to determine scores for these tests 
by which students could be selected for admis- 
sion to the course. Those failing to meet these 
standards could be given remedial work before 
enrolling in the course. Two methods of selec- 
tion were tried: (1) a minimum total score on 
the pre-test, and (2) a minimum score for each 
part of the pre-test. 

Since neither of the previous methods had 
proved satisfactory for selecting students, the 
reported investigation was undertaken. The 
technique of discriminant functions was em- 
ployed using three variables (1, 2). The three 
tests selected were: 

1. The raw score on Part I of the Chem- 
istry Pre-Test consisting of twenty-five 
arithmetic reasoning and computation 
items. 

2. The raw score on Part II of the test con- 
sisting of fifty-two reading items, re- 
quiring no previous chemistry but cover- 
ing chemistry material. 

3. The ACE total decile rank. 


The application of discriminant functions will 
enable us to: (1) determine the relative value of 
each test as a predictive factor, and (2) estab- 
lish a critical composite score of the three 
tests to be used as a basis for selection. In de- 
riving the discriminant, a group of 979 students 
who had completed the course were taken as the 
sample. Since it was believed that sex differ- 


ences would operate in the performance on the 
tests, a discriminant function was obtained for 
the men and the women separately. Each sex 
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was divided into two groups on the basis of the 
chemistry grade. The four resulting groups 
were: 

A males receiving grades of A,B or C 

B males receiving grades of D or F 

C females receiving grades of A, B or C 

D females receiving grades of D or F 


The students were classified on this basis 
rather than pass or fail since it was believed 
that D students as well as F students were in 
need of remedial work. 

The calculated measurements for the four 
groups are summarized in Table II, where the 
Part I score is denoted by x,, the Part II score 
by x2, and the ACE total decile rank x;. 

Let xj represent the score of the student on 
test i. Let the linear function of the scores be 

(1) X= Zajxj i = (1, 2, 3) 

i 
Let the difference between the means of x; be 
represented by dj where i = (1,2,3). Let the 
sum of squares or products from specific means 
within classes be represented by Sj; where i, j 
= (1,2,3). Then for any linear function, X, of 
the measurements the difference between the 
means of X for the two groups is 


(2) D = Zjajdj i = (1,2, 3) 

The variance of X within classes is proportion- 

al to Sy = ZZ ajajSjj where i, j = (1,2,3). The 
ij 


constants aj can be found by maximizing the 
ratio D2/S, and they are proportional to the 
solutions of the three normal equations (3) writ- 
ten in matrix form 






Si1 Si2 Sis a, d, 
(3) S2, S22 S23 a2|=|| dz 
83, Ss2 S33 a3 d; 


Making the substitution L = aj VSii (i = 1,2, 
3) in equation (3) and multiplying the equation 
on the left by the diagonal matrix, whose elem- 
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ents are 1/ VSjj, we obtain 
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TABLE I 


CORRELATIONS BETWEEN CHEMISTRY GRADE AND SEVERAL VARIABLES 





Correlations 
Variable Male Female 
ACE Psychological Examination 
Quantitative decile rank . 265 . 269 
Linguistic decile rank . 243 . 340 
Total decile rank . 307 . 345 








Michigan State College Chemistry Pre-Test 
Part I score . 484 . 448 
Part II score . 486 . 415 
Total score . 546 . 521 














TABLE 0 


CALCULATED MEASUREMENTS FOR THE FOUR GROUPS OF THE STUDY 





Group A Group B Group D 





517 265 91 
6, 720 2,520 823 
11,911 6, 650 2,247 
3,247 1,310 471 

13. 9. ‘ 9.04 

30. 25. . 24. 69 

6. 4. . 5.18 
94, 788 27,114 8, 313 
505, 194 174, 907 58, 285 
23, 164 8, 318 3,135 
210, 657 64, 330 20, 524 
43,970 13, 048 4, 460 
102, 962 34, 480 12, 312 


Female 
2.34 
4.20 
1.39 
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L, d\ 


where dj = dj/v Sii 








The coefficients L,, Lz, Ls can be obtained 
by the relation L = r-ip', where L and D' are 
the one column matrices and R~!lis the inverse 
of the correlation matrix, R, in equation (4). 
The.a; can be computed by the relation aj = Lj 
(Sj (i = 1,2, 3). 

In Table III the computations leading to the 
pooled sums of squares and products within the 
two groups, A and B, are recorded. In the line 
of totals, the entries are the combined sums of 
squares and products for the 782 men. The line 
for groups contain the sums of squares and pro- 
ducts of the group sums which are calculated in 
the usual manner. As examples, the entry for 
column x, in row x, of Table I is 


(s720)° e (2520 = 111,310. 7855 


and for column x, row x,, 


54, 662. 0780 





(6720)(3247) , (2520)(1310) 
517 265 


The differences in the third line are the sums 
of squares and products of deviations from 
means within the groups. The calculation of the 
standard deviations and the correlation coeffic- 
ients within the groups proceed in the normal 





manner. As examples, 
Sx, = {23,552.6146 = 153.4686 =5. 4951 
780 27. 92848 
Ti2= 4, 639. 9490 = . 423339 





(153. 4686) (71. 4177) 


The degrees of freedom used, 780, are those 
within the two groups (517-1) + (265-1). 
Substituting the correlations from Table III 
and the values of d;, dj, dj into equation (4) the 
following normal equations are obtained: 


1.000000 .312590 .320540} //L,j j. 033912 
.312590 1.000000 . 423339} ||L2}=/. 037076 
. 320540 . 423339 1.000000] |iL3jj i]. 018763 


The solution of this system of equations is L, = 
.025164, L, = . 030071, Ls = -.002033. The 
corresponding aj values are a, = . 000244516, 
a, = .000195942, and a, = -. 000028466. The 
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composite function for the men from equation 
(1) is 


X= .000244516 x,+ .000195942 x, - .000028466 x;,. 


For the linear function, X, of the three tests, 
the difference between the means of X for the 
two male groups is, as noted in equation (2), 


D= (.0002 44516)(3.49)+(.000195942)(5.69)- 
(.000028466)(1.34) = . 00193013 


In order to test the significance of this dif- 
ference, the sums of squares of the composite 
X is analyzed into two parts, within the groups 
and between the groups. The sum of squares 
within groups is D. The between groups sum 
of squares is: 


N,N, D? =(517)(265) (. 00193013)? = 
N, +N 782 


(175. 198209)(. 00000372540) = . 00065268 


The analysis of variance is completed in Table 
IV. 

To compute the critical composite score for 
selection, the function U = §(Xq + Xp) is used 
where X, is the average composite score for 
group A, and Xp is the average for group B 
(Ref. 2). For the male measurements we 
have 


U = . 00903104 + .00710091 = . 00806598 


2 





Any male student having a composite score 
less than U should receive remedial work be- 
fore entering Chemistry 101, and students hav- 
ing a composite score equal to or greater than 
U should be admitted to the course. 

Applying a similar procedure to the measure- 
ments for groups C and D in Table II, we obtain 
the composite function for the women, 


X=. 000801278 x,+.000462742 x,=.000251716 x, 





The difference between composite means of the 
two groups is 


D=(.000801278)(2.34)+(.000462742)(4.20)+ 
(.000251716)(1.39)=.00416839 


The between sum of squares is 


(206) (.00416839)? =(48.964467)(.0000173755) 


= .000850782 


The analysis of variance for groups C and D is 
completed in Table V. The critical score for 
selecting women is U = .0220567. 

It may be concluded from these results that 
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the linear compound of the three tests disting- 
uishes significantly between students obtaining 
grades of A, B, or C in Chemistry 101 and 
those obtaining grades of Dor F. The fact 
that the ratios of the weightings for the three 
tests are different for each sex substantiates 
the hypothesis that the sex factor is of some 
importance in selection. 

The relative value of the three tests for dis- 
tinguishing between groups is indicated by the 
coefficients in the composite function. For 
each sex the coefficients indicate that success 
in Chemistry 101 is most closely rblated to the 
score on Part I of the Pre-Test, less closely 
related to the score on Part II of the Pre-Test, 
and least closely related to the ACE total decile 
rank. The use of the decile rank instead of the 





raw score on the ACE undoubtedly lowered the 
value of that test for selection, but the author 
feels that this fact made little difference in the 
final results of the investigation. In the selec- 
tion of the men the ACE proved relatively un- 
important and could be eliminated as a selec- 
tive factor. 
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EFFECT OF THE ORGANIZATION OF 


LEARNING EXPERIENCES UPON THE OR- 
GANIZATION OF LEARNING OUTCOMES. 
I. STUDY OF THE PROBLEM BY MEANS 


THE PROBLEM 


Learning as organization 


Contemporary theory considers learning as 
a process involving a change in the organiza- 
tion of behavior rather than as the acquisition 
of a multitude of discrete reactions (8:326). 
This implies that organization is essentially an 
integration of behavior patterns so that the in- 
dividual is able to relate various reactions in 
adjusting to or solving problem situations of 
one kind or another. 

The question arises as to how, practically, 
one may obtain evidence as to the extent of or- 
ganization or integration of behavior. A use- 
ful way is to determine both the intercorrela- 
tions among tests of the various outcomes un- 
der consideration and the factor patterns under- 
lying performance on the tests. The first 
method gives an indication of the consistency of 
individual performance from one outcome to 
another, while the other provides information 
on the broader patterns into which learning is 
organized. These methods thus define the “or- 
ganization of learning outcomes” in statistical 
terms. It is this approach which was utilized 
in the present study. 


Importance of organization 





Organization is important from the stand- 
point of the individual learner in at least two 
respects. First, the extent to which various 
outcomes are organized may affect the individ- 
ual’s ability to generalize his learning from 
one content field to another and to broader areas 
of everyday life. Without such organization, the 
possibilities of transfer and generalization are 
greatly limited and the individual’s behavior 
patterns may remain relatively compartmental- 
ized. Thus he may be effective in solving prob- 
lems in one kind of situation but not in another. 
And even in problem situations of a similar 
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kind he may not be able to interrelate effective- 
ly the kinds of reactions required for a satis- 
factory adjustment or solution. Secondly, the 
permanence of learning is likely to be greatly 
enhanced if the various outcomes have acquired 
some interrelation in behavior. Isolated skills 
and items of information tend to be forgotten 
rapidly but those aspects of behavior which bear 
a functional relationship to other aspects have 

a much greater probability of being called in to 
use periodically and of being reinforced. It is 
known from research studies that those learning 
outcomes which are continually reinforced and 
progressively developed into more generalized 
modes of reaction are most likely to survive in 
the long run. 

It would seem, therefore, that the organiza- 
tion of behavior should be an important objec- 
tive of education in its own right. This is tant- 
amount to saying that it should not be left to 
chance. Some form of organization (or disorg- 
anization) of learning is likely to emerge from 
a series of learning experiences. If the effec- 
tive organization of learning outcomes is de- 
sired and sought, then provision should accord- 
ingly be made for fostering the kind of organi- 
zation desired. 

This point of view has important implications 
for educational theory and practice. The ac- 
ceptance of organization or integration as an 
important objective of education would necess- 
itate, in the first place, the selection of learn- 
ing experiences which will contribute effective- 
ly to its development. Even more important 
than the selection of the experiences themselves 
are the ways in which these experiences are 
presented to students. Consideration must not 
only be given to the integration of experiences 
over various content areas but also to the se- 
quential development of outcomes over a period 
of years. Both considerations must be taken 
into account i f learning experiences are to re- 
inforce each other over a period of time and 
from one content area to another. A final im- 
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portant implication is the idea that educational 
evaluation should attempt to appraise the extent 
to which the desired organization of behavior 
has actually been brought about in individual 
Students. 

The problem of organizing learning experi- 
ences so as to bring about desired changes in 
the organization of behavior is not an easy one. 
Traditionally, learning experiences have been 
presented in a ‘‘logical’’ organization, that is, 
a systematic treatment of subject fields. This 
form represents the scholar’s way of organiz- 
ing knowledge. It has the advantage of provid- 
ing a convenient basis for professional special- 
ization of teaching personnel. It has the disad- 
vantages of being artificial from the standpoint 
of the learner and of facilitating the compart- 
mentalization of learning; relationships between 
ideas from different fields are difficult to bring 
out. Frequently, attempts are made to remedy 


these shortcomings by stressing in several sub- 


ject fields such common objectives as reading 
skills, critical thinking, and writing skills, or 
by providing for the application of ideas learned 
in one field to situations in other fields. Other 
attempts to provide for the integration of learn- 
ing experiences include the formal correlation 
of two or more subjects, the organization of 
subjects in the same area into broader fields, 
and the core or unified curriculum. Not much 


is known about the effectiveness of these various 


solutions in bringing about desirable changes in 
the organization of behavior. In fact, little is 
known about changes in the organization of com- 
plex learning outcomes over an extended period 
of time, let alone the influences of the experi- 
ences underlying the changes. 

Although the organization of learning out- 
comes seems to constitute an important prob- 
lem, a comparatively few studies have invest- 
igated it by the method of factor analysis (2, 3, 
4, 5, 10, 11, 12, 14, 15). Only one of them 
compares factor patterns before and after an 
extended period of relatively complex training, 
and this one investigates the interrelations 
among a set of rather specific behaviors ap- 
praised in but a single course-- Freshman 
Chemistry (11). The other studies of achieve- 
ment analyze the organization of learning as 
an end result of a particular period of instruc- 
tion. They furnish no information as to whether 
the pattern of organization existing at the be- 
ginning of the period of instruction changed as 
a result of instruction or whether it remained 
substantially stable. 


Significance of present study 





The significance of the present study lies 
both in the problem and its methodology. The 
problem is a neglected one. There have been 
many studies of growth in a single ability and 
of parallel growth in several abilities such as 
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those in reading. Some attention has also beep 
paid to the pattern or profile of achievementas 
indicated by the intercorrelations among tests 
of various abilities. A few studies have also 
dealt with the pattern of intercorrelations be- 
fore and after a period of learning. However, 
as indicated above, practically no research has 
dealt with changes in the pattern or organiza- 
tion of learning outcomes after a period of time 
and no attempt has been made to evaluate the 
influences of the organization of curricular ex- 
periences upon the organization of learning out- 
comes. From the standpoint of methodology, 
the present study is significant because it util- 
izes the method of factor analysis to compare 
changes in the pattern of organization itself be- 
fore and after an extended period of relatively 
complex training. 


Problem investigated 





The problem of this study was to determine 
the extent to which a particular pattern of learn- 
ing experiences would bring about correspond- 
ing changes in the organization of learning out- 
comes. Organization of learning was defined 
in terms of the degree of intercorrelation among 
tests of the various outcomes under considera- 
tion. In this article the problem will be studied 
by means of correlation analysis; in a second 
article, by the method of factor analysis. 


PROCEDURE: SELECTING THE GROUPS AND 
APPRAISING THEIR GENERAL EDUCATION 


Context of the study 





The present investigation utilized evaluation 
data compiled in the second testing program of 
the Study of Educational Progress, a study un- 
dertaken by the University of Chicago in coop- 
eration with a number of other educational insti- 
tutions interested in a long-term appraisal of 
the effects of a general education upon students 


in secondary and higher education. Initial test- 
ing of this second program was carried out in 
the autumn of 1945, and the final testing in the 
spring of 1947. 

The University of Chicago assumed major 
responsibility for the construction and furnish- 
ing of tests, the scoring and analyzing of re- 
sults, and the reporting of data to the cooper- 
ating schools. The tests were constructed by 
committees consisting of examiners and in- 
structors in the appropriate subject fields of 
the College of the University of Chicago. In 
each subject field an attempt was made to de- 
fine some of the major objectives of general 
education and to construct test situations which 
would sample an individual’s level of compet- 
ence with respect to these objectives. It was 
thought that the objectives so defined constitu- 
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ted an important and desirable list of objectives 
for other programs of general education in the 
eleventh and twelfth grades. In any case, the 
tests finally constructed were designed to sam- 
ple the performance of students with respect to 
objectives involving knowledge, intellectual 
abilities, and intellectual skills. 


Selection of two groups of students 





The basic problem here involved selecting 
from among the participating schools two which 
offered quite contrasting curricula. A great 
contrast in curricula would mean, in effect, a 
great contrast in the pattern of learning exper- 
iences. After a preliminary survey of the par- 
ticipating schools, it was thought that this re- 
quirement could best be met by contrasting the 
program of the College of the University of Chi- 
cago with that of a conventional high school. Of 
course, the respective curricula had to meet 
certain minimum criteria if this study was to 
be sufficiently controlled: (1) they should have 
provided students with opportunities for growth 
toward most of the objectives measured in the 
testing program; (2) they should have provided 
acommon core of courses so that the sample 
of students selected could be treated as a group 
for statistical purposes; and (3) sources of in- 
formation should be available on the curriculum 
as it had existed during the period 1945-1947 
so that valid inferences about its emphases 
could be drawn. 

At the outset of this investigation it was de- 
cided to include the College of the University 
of Chicago as one of the two institutions to be 
contrasted with reference to their educational 
programs. The main justification for this de- 
cision lay in the distinctive features of the Chi- 
cago plan of college education (1:35-37). Inthe 
first place, the College program recognized 
practically all of the objectives appraised in 
the Study of Educational Progress as objectives 
of its own educational program. By contrast, 
many of these objectives received only lip- 
service in the conventional secondary-school 
curriculum at the 11th and 12th grades while 
mastery of factual information retained a cen- 
tral prominence in much of teaching and learn- 
ing. Secondly, these objectives found expres- 
sion in an integrated system of prescribed gen- 
eral courses covering the principal fields of 
knowledge rather than in an assortment of sep- 
arate subject courses selected by the student. 
Thirdiy, methods of instruction and learning de- 
parted radically from conventional secondary 
school practices in that they were based upon 
reading and analysis of original works rather 
than upon study of textbooks or other secondary 
references. Finally, the achievement of these 
objectives by students was measured by com- 
prehensive examinations in each of the fields 
rather than by adding up credits earned in sep- 
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arate courses. These features are well suited 
to the highly selected student body of the College. 

After it had been decided to include the Col- 
lege of the University of Chicago as one of the 
two schools to be contrasted, other criteria 
dealing with the characteristics of the student 
groups assumed importance. The problem was 
Simply one of matching two samples of students. 
A set of three criteria—scholastic aptitude, in- 
itial scores on the achievement tests, and init- 
ial pattern of test intercorrelations—was found 
to eliminate all of the other possible schools. 
No one of them could match the College sample 
on either aptitude or initial test scores because 
of the highly selected nature of that student 
body. The best possible sample that could be 
selected was a pooled group of cases from two 
urban public high schools of the same system. 
This sample was able to match that of the Col- 
lege on scholastic aptitude, but the latter show- 
ed a Statistically significant superiority on a 
majority of the initial achievement scores. A 
third criterion for matching, namely, similar- 
ity of initial test intercorrelations, was found 
to be too rigorous. In this regard, it should be 
noted that there is no certainty that test inter- 
correlations will be comparable even if means 
and standard deviations are equal. It was felt, 
however, that the study was promising enough 
to be carried out despite these limitations. 

The groups finally selected included a sam- 
ple of 60 students of the College of the Univer- 
sity of Chicago and a sample of 63 students of 
two public high schools of the same system. 
Both groups were beginning the eleventh grade 
at the time of the initial testing. Although it 
would have been desirable to get at least 100 
cases in each group to insure greater stability 
of the basic data, the relatively small number 
who took part in the retesting precluded the ful- 
fillment of that ideal. 

It can be seen from Table I that the samples 
were equivalent with respect to scholastic apti- 
tude. The difference of 1.07 between means is 
not significant. These samples represent quite 
superior groups of students. The mean score 
of either group (11th grade at the time of test- 
ing) is equivalent to a percentile rank of 85 on 
the college freshmen norms for the 1944 edition 
of the A.C. E. Psychological Examination (13: 
12). 


Appraisal of general education 





The condition of a common pattern of courses 
within either group of students for the academic 
years 1945-47 was fairly well fulfilled. In gen- 
eral, the College students had had two years of 
English, the first two years of humanities (arts 
and literature), the first two years of social 
sciences, one year of natural science, and one 
year of mathematics. However, they had not 
yet taken work in the biological sciences in the 
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TABLE I 


MEAN AND STANDARD DEVIATION OF THE A.C.E. PSYCHOLOGICAL EXAMINA- 
TION TOTAL SCORES FOR THE COLLEGE GROUP AND 
FOR THE HIGH SCHOOL GROUP 





High 


Statistic College Group School Group 





Number of cases 60 
33 


Boys 

Girls 27 
Mean 
Standard Deviation 
Standard error of the Mean 


133. 05 











Difference between Means 
Standard error of the difference 
between Means 








Critical ratio 
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College. On the other hand, all of the high 
school students had taken a year of eleventh 
grade English, but only about half had taken at 
least a semester of regular 12th grade English; 
most had had at least two semesters of a foreign 
language during the last two years of high school; 
most had taken physics and chemistry; and fin- 
ally, most had had at least a course in advanced 
or college algebra. In two important fields of 
general education, the visual arts and the bio- 
logical sciences, this latter group had had little 
formal instruction during these two years. 

Certain contrasts between the two programs 
of general education were noted. Whereas the 
curriculum of the College was organized in 
broad fields, that of the high schools was organ- 
ized in specific subjects. Sequences of courses 
comprised the broad fields in the College; single 
semester or year units comprised the actual 
structure of specific subjects in the high schools. 
The College curriculum was very largely pre- 
scribed, and individual differences found ex- 
pression primarily in differential rates of prog- 
ress through the program. The high school 
curriculum was only partly prescribed, and in- 
dividual differences commonly found expression 
in differential patterns of elective courses. 

Instructional materials used in the College 
were generally original works, while those in 
the high schools were mostly textbooks or sec- 
ondary references, with the exception of the 
English courses. Instructional methods in the 
College therefore tended to focus upon critical 
interpretation of such works, upon the develop- 
ment of abilities and skills other than the recall 
of information. Instructional methods in the 
high schools, with the exception of those in the 
English courses, tended to emphasize mastery 
of well-organized subject matter presented in 
textbooks; opportunities for critical reading 
were correspondingly limited. 

Evaluation instruments in the College were 
constructed from a carefully developed set of 
specifications which defined the major objec- 
tives of each course in terms of content and 
Student behavior. Comprehensive examinations 
formed an integral part of the program and con- 
stituted the final means of evaluating achieve- 
ment. The instruments used in the high school 
courses tended to be the short-answer type of 
test which stressed mastery of specific elements 
of content. 

In addition to the foregoing broad compar- 
isons, information was needed as to the nature 
of the learning experiences afforded students. 
Data were accordingly obtained which made 





possible the drawing of inferences as to the ex- 
tent to which particular courses or combinations 
thereof had contributed to intellectual growth 
toward each of the objectives. 
priate to the respective programs were util- 
ized to gather such data. A committee of judges 
consisting of evaluation experts in each subject 
field then independently reviewed these data and 
estimated the relative degree of emphasis given 
each objective in the two curricula. Three de- 
grees of emphasis were established, corres- 
ponding to the extent of preparation for accom- 
plishing the test exercises for each objective: 
‘«marked, ’’ ‘‘some, ’’ and ‘‘little.’’ Table I 
presents a comparison of the relative degrees 
of emphasis in the College and high school pro- 
grams u 
Study. i, 
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Methods appro- 


- the objectives measured in this 


STUDY OF THE PROBLEM BY MEANS OF COR- 
RELATION ANALYSIS 


Hypothesis to be tested 





The following hypothesis will be tested in 
this section: Objectives which have been emphas- 
ized together in the learning experiences of 
students will tend to become interrelated as out- 
comes of such learning. 

This statement calls for some clarification 
with reference to the meaning of ‘‘emphasized 
together’’ and ‘‘interrelated.’’ By ‘‘emphas- 
ized together’’ is meant the extent to which op- 
portunities had been provided students for con- 
comitant growth toward two or more objectives 
of a course or entire program. Objectives 
treated in a single course were thought of as 
being ‘‘emphasized together”’ if the learning 
experiences which contributed to the develop- 
ment of competence in one also contributed in 
some measure to the development of compet- 
ence in the others. It was believed that objec- 
tives in courses of the College and high school 
programs which had received at least ‘‘some’’ 
degree of emphasis in instruction could be con- 
sidered as having been ‘‘emphasized together’’ 
in the particular course. Certain objectives, 
on the other hand, were thought of as having 
been ‘‘emphasized together’’ in the entire two- 
year program, provided that they had received 
at least ‘‘some’’ emphasis in courses of dif- 
ferent subject fields. For example, if skills 
in the interpretation of various forms of data 
were stressed in the social studies, the phys- 
ical sciences, and mathematics, they could be 





the Study of Educational Progress. 
prectical considerations. 


. The diseertation iteelf elaborates all of these probleme and examines in detail 
relationships between course emrhases and various aspects of achievement, and 
correlations between initial and final scores in relation to pattern of emrhases. 


. These objectives include most of those measured in the second testing prorram of 
A few tests had to be omitted because of 
















JOURNAL OF EXPERIMENTAL EDUCATION 


TABLE II 


RELATIVE DEGREES OF EMPHASIS IN THE COLLEGE AND HIGH SCHOOL PROGRAMS OF 
THE OBJECTIVES APPRAISED IN THE STUDY 





Relative Degree 
of Emphasis 





Objective College High School 


Little Marked 
Little Some 
Marked Some 
Marked Some 
Marked Little 
Marked Little 
Marked Some 
Marked Some 





Understanding of the meaning of words 
Analysing sentences in grammatical terms 
Organizing notes into a logical order 
Detecting errors in sentence structure 
Analysis of painting and architecture 
Knowledge about painting and architecture 
Analysis of lyric poetry 
Reading and analysis—social issues 
Relating knowledge of American history to 
social issues 9 Marked Some 
Knowledge of facts in American history 10 Marked Marked 
Summarization of trends—American history 11 Marked Some 
Summarization of trends—contemporary society 12 Marked Little 
Judging relations— American history 13 Marked Some 
Judging relations—comtemporary society 14 Marked Little 
Application of knowledge—social sciences 15 Some Little 
Judging validity of evidence—social issues 16 Some Little 
Knowledge of facts about the physical sciences 17 Some Marked 
Application of principles—physical sciences i8 Marked Some 
Judging validity of evidence—physical sciences 19 Marked Little 
Interpretation of data—physical sciences 20 Marked Some 
Reading and analysis—natural sciences 21 Marked Little 
Knowledge of facts in the biological sciences 22 * * 
Application of facts—biological sciences 23 . . 
Knowledge of structure and language of 
mathematics 24 Marked Marked 
Ability to make elementary mathematical 
manipulations 25 Marked Marked 
Understanding of the fundamentals of verbal logic 26 Marked Little 
Graphical interpretation— physical phenomena 27 Marked Little 


onouhwnwre 














*Practically none of the students in either group had work in the biological sciences; therefore, 
the objectives were really not applicable. 
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regarded as having been ‘‘emphasized together’’ 
in the two-year program. Even an objective 
such as acquisition of information could be 
similarly regarded if it had received consider- 
able emphasis throughout the various courses 

of the educational program. 

By ‘‘interrelated’’ as outcomes of learning 
is meant that the outcomes had become associ- 
ated—that is, the performance level of the in- 
dividual was relatively consistent or similar 
from one kind of outcome to another. A high 
degree of competence in the application of phys- 
ical science principles to new situations may 
thus be accompanied by, or associated with, a 
high aegree of mastery of physical science 
facts and these in turn to a high degree of com- 
petence in the graphical interpretation of phys- 
ical phenomena. Or, such consistency might 
extend to different subject fields, so that a high 
degree of attainment of these physical science 
objectives may be associated with a high degree 
of attainment of mathematics objectives and of 
social science objectives. Psychologically, 
“interrelatedness’’ is akin to integration of the 
individual’s behavior patterns in which they 
function not as compartmentalized aspects of 
his repertory of possible behavior but as as- 
pects which can jointly contribute to the solu- 
tion of problem-situations in many fields. For 
the purposes of the following analysis, however, 
“jnterrelatedness’’ may be determined by the 
degree of association between the tests of var- 
ious paired categories of outcomes. 

The preceding hypothesis was stated general- 
ly so as to subsume the many specific hypoth- 
eses which might have been offered to account 
for more restricted aspects of the data. Thus 
amore specific hypothesis than the stated one 
would be that the effects of the College program 
should be such as to foster the development of 
a high degree of interrelation among thinking 
abilities and skills such as application of knowl- 
edge and interpretation of data, objectives which 
had received considerable emphasis in several 
subject fields. Similarly, it might be hypoth- 
esized that the effects of the high school pro- 
gram, in which predominant emphasis in most 
courses was placed upon the acquisition of in- 
formation, should be such as to encourage the 
development of a high degree of interrelation 
among ideas acquired in the several subject 
fields; that is, knowledge would tend to fuse as 
a learning outcome, but the various thinking 
abilities and skills would tend to emerge as 
relatively specialized or compartmentalized 
outcomes. 
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Method of testing hypothesis 





This hypothesis was tested by examining 
changes in coefficients of correlation over the 
two-year period of education. Correlations 
were grouped into three major categories. 3 
The first category included intercorrelations 
grouped by subject field: English, humanities 
(arts and literature), social sciences, physical 
sciences, and mathematics. 4 The second cat- 
egory included correlations grouped according 
to the presumed similarity of the mental pro- 
cesses involved in certain tests. For example, 
such groupings might include critical thinking, 
recall of information, and even smaller group- 
ings such as reading, language expression 
skills, application of principles, and interpre- 
tation of data. The third category included 
groupings of correlations by relative degree of 
emphasis. That is, correlations of all pairs 
of tests in which both corresponding objectives 
had received the same relative degree of em- 
phasis in the educational program were put in 
the same grouping. This procedure, of course, 
yielded three groups of such correlations rep- 
resenting the three degrees of emphasis— 
‘*marked, ’’ ‘‘some, ’’ and “‘little. ’’ 

For the first two groupings of correlations, 
then, the hypothesis was tested by comparing 
patterns of change in individual coefficients with 
the corresponding patterns of emphases of the 
related objectives inthe courses of study. From 
comparisons of this kind it was possible to de- 
termine whether or not correlations involving 
emphasized objectives had increased over the 
two-year period and whether or not correla- 
tions involving non-emphasized objectives had 
decreased or remained stable. To facilitate 
this kind of comparison, a set of tables was 
devised to present individual coefficients in 
adjacent columns representing the four situa- 
tions of this study: 


College Group. .... Initial Testing 
High School Group. . . Initial Testing 
College Group. .... Final Testing 
High School Group. . . Final Testing 


Subtractions were then made of the final and 
initial coefficients to yield changes. 

It was assumed that the tests used in the 
present study were valid and reliable measures 
of the objectives which they purported to ap- 
praise. They were carefully made by expert 









3. The writer did 
interrret each 


not consider it necessary for the ourposes of this study to try to 
observed correlation and its changes in the two groups of studente. 


The task would heve involved some 406 coefficients of correlation—a major study 
in iteelf. It was believed that factor analysis of the correlation matrices 
would yield more economical results than the study of individual coefficients. 


4. A grouping of biological science tests was not included since only two were 


involved. 
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TABLE Il 


DISTRIBUTIONS OF THE INTERCORRELATIONS OF THE TWENTY-NINE VARI- 
ABLES FOR THE FOUR SITUATIONS STUDIED* 





College Group High School Group 





Correlation Coefficient Initial Final Initial Final 
Correlation of Coefficient Tests Tests Tests Tests 


800 849 1 1 
750 799 1 na 
700 749 1 ee 2 
650 699 3 1 
600 649 12 eee 11 
550 599 17 18 
500 549 19 19 
450 499 21 31 
400 449 45 35 
350 399 44 51 
300 349 49 52 
250 299 47 38 
200 249 54 51 
150 199 38 22 
100 149 30 37 
050 099 14 
000 049 3 
-050 -001 5 
-100 -051 2 
-150 -101 os 
-200 -151 
-250 -201 +a els 

Total 406 406 4 





Median of all coefficients 312 347 229 311 
Mean of positive coefficients 325 350 249 327 
Mean of negative coefficients -043 -051 -066 -089 
Mean of all coefficients 319 346 227 311 
S. D. of all coefficients 155 154 162 167 


*In this and the following table the decimal point for coefficients has been omitted. 
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test constructors and there seemed to be no 
reason for believing that any of them did not 
provide evidence of the kind of student behav- 
jors desired. So far as reliability was con- 
cerned, most of these tests seemed to be long 
enough to insure stability of the results. Be- 
cause of this likelihood and because of the 
labor required, reliability coefficients were 
not computed. 

Before proceeding with the analysis of data 
for the various groupings, certain comments 
will be made about the basic data—the four 
correlation matrices—and about problems of 
statistical significance. 


Basic data and statistical considerations 





The basic data of the present study were the 
406 coefficients of correlation between the 29 
test scores for each of the four situations. 9% 
These coefficients were obtained by the product- 
moment method adapted for use with I. B. M 
electrical equipment. In order to make this 
adaptation economically feasible, the original 
raw test scores were transformed into single- 
digit scores. A small but slight error was in- 
troduced into the coefficients through this trans- 
formation. 

A question which should be raised is that of 
the significance of the obtained coefficients of 
correlation. It is possible to determine the 
probability that a given r is significantly differ - 
ent from 0 by using formulas given by Fisher 
(6:177, 198, 214). It was found that for the 
College group where N = 60 a correlation as 
large as . 330 is needed for significance at the 
1 percent level (t = 2.576). By this standard 
185 or about 46 percent of the 406 initial correl- 
ations and 216 or about 53 percent of the final 
correlations could be considered significant. 
When the same formula is applied to the high 
school group with N = 63 it was found that a 
value of . 322 is needed for significance. For 
the initial tests 117 or about 29 percent of the 
coefficients were this large; for the final tests, 
194 or about 48 percent of the coefficients. 

A second question deals with the significance 
of changes in individual r’s from initial to final 
testing. The test of significance of a difference 
between two r’s obtained from independent ran- 
dom samples requires that one find the stand- 
ard error of the difference between the z-values 
corresponding to the observed r’s (9:211-218). 
The formula basic to this procedure is strictly 
applicable only to those comparisons of the 
present study involving an observed r for the 
College group and one measuring the same re- 
lationship in the high school group, if the two 
groups are considered independent or ‘‘uncor- 
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related’’ samples. However, this formula was 
used as the best available means of estimating 
the standard error for differences involving 
correlations derived from two applications of 
a test to the same group—initial-to-final 
changes. Since the use of this formula gave 
somewhat larger estimates than would be ex- 
pected with ‘‘correlated’’ groups, these esti- 
mates were conservative. 

The formulas and table of z-values present- 
ed by Lindquist were used to compute the stand- 
ard error of certain large differences and to 
estimate the size of a difference needed to be 
significant. For the present study the upper 
limit for the standard error of the difference in 
initial-to-final comparisons was . 187 for the 
College group and . 183 for the high school group. 
Solving for z with t = 1.96, z-values must dif- 
fer by . 367 in the College group and by . 359 in 
the high school group in order for differences 
between r’s to be regarded as significant at the 
5 percent level. 

It should be apparent from this discussion of 
Statistical significance that conclusions drawn 
from the data to follow are tentative in that they 
cannot be based upon large significant differ- 
ences in individual coefficients of correlation. 
Conclusions will emerge from trends in the data 
even though each individual difference may be 
too small to pass the test of statistical signifi- 
cance. 


Results of correlation analysis 


Before considering the changes in specific 
groupings of r’s, it may be helpful to note over- 
all trends as indicated in Table II. In the case 
of both groups of students, the effects of the 
two years of education were to bring about a 
small but statistically significant increase in 
the mean of all correlations. At the final test- 
ing, however, the mean r of each group was 
still relatively low: . 346 for the College group 
and . 311 for the high school group. It is inter- 
esting to note that the two groups became more 
alike in this respect after the additional educa- 
tion. 

If the hypothesis previously stated is to be 
borne out by the data, the intercorrelations of 
tests of objectives which had been given emph- 
asis in the courses of study should have showed 
substantial increases after the two years of 
additional instruction. This increase should 
have been all the more expected, since most 
of the intercorrelations were initially of low 
order of magnitude and indicated but little in- 
terrelationship among the various measures. 

The findings with respect to the stated hy- 
pothesis will now be presented. They repre- 











tively, 
Examination. 
testing only. 


5. Scores 1-27 are measures of achievement, while scores 28 and 29 are respec- 
the Quantitative and Linguistic scores of the A.C.E. Psychological 
These last two scores were obtained at the time of the initial 
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sent the results of rather detailed analysis of 
changes in coefficients within the various 
groupings. The summary data in Table IV in 
general reflect the trends of the more detailed 
analyses carried out as a part of the original 
study. 

When intercorrelations of tests in the var- 
jous subject fields were examined for the Col- 
lege group, it was found that there was not 
much evidence in English, humanities, and the 
social sciences of a direct, clear-cut relation 
between the pattern of emphases in the relevant 
courses and the pattern of changes in individual 
coefficients of correlation. Only in the physical 
sciences and mathematics was there sucha 
clear-cut relation, in the sense that increases 
in intercorrelation appeared to be associated 
with the degree of emphasis of corresponding 
objectives in the courses of study. 

Parallel data for the high school group gave 
somewhat more support to this hypothesis so 
far as subject-field groupings are concerned. 
Again there was a clear-cut relation between 
the pattern of emphases in the physical sciences 
and mathematics and the pattern of changes in 
correlations, in the sense that increases in in- 
tercorrelation seemed to be associated with the 
degree of emphasis of corresponding objectives 
in the courses. Also, intercorrelations in the 
social sciences tended to increase as a result 
of instruction in most of the objectives appraised 
in this field. The fact that the three objectives 
in humanities were not emphasized, and the in- 
tercorrelations of their tests did not increase 
especially, gives a different kind of support to 
this hypothesis; that is, at least the lack of 
emphasis of these objectives did not bring about 
an increase in intercorrelation of the tests. 

When intercorrelations of tests grouped by 
similarity of mental process were examined 
for both groups of students, there was little 
evidence of a direct intimate relation between 
the pattern of emphases, on the one hand, and 
the pattern of changes, on the other. 

Further evidence of the tendency toward spec- 
ialization in academic achievement can be seen 
in the comparison of mean intercorrelation of 
tests within a grouping with the mean correla- 
tion of those tests with the remaining achieve- 
ment tests in the battery. In the first three 
fields indicated in Table IV, the final mean in- 
tercorrelation of tests was not much greater 
than the final mean correlation of those tests 
with the remaining achievement tests in the ba 
tery. These data suggest not only that those 
tests did not have much in common among 
themselves but also that the performance of 
students was very consistent from objective to 
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objective within a field, let alone throughout all 
of the fields. In the fields of physical sciences 
and mathematics, however, it is evident that 
the tests did measure a relatively distinct set 
of skills and knowledges. It is apparent from 
the data on groupings by mental process that 
the tests within any grouping did not have much 
more in common among themselves than they 
did with the remaining achievement tests in the 
battery. Here, too, the performance of stud- 
ents evidently was not consistent on tests of 
Similar objectives in different subject fields. 

A crucial test of the hypothesis that emphas- 
ized objectives will tend to become interrelated 
as outcomes of learning is to study changes in 
r’s, that is, differences between final and initial 
r’s, grouped according to relative degree of 
emphasis. Distributive data for these differ- 
ences in r’s appear in Table V. 

Obviously the data for the College sample 
did not permit a rigorous test of this hypothesis 
since there were too few correlations in the 
‘‘some’’ and ‘‘little’’ categories to give stable 
results. It will be recalled in this connection 
that most of the objectives of the College 
courses had received a ‘‘marked’’ degree of 
emphasis; hence the small number of r’s in the 
remaining two categories. Some suggestive 
data, but again not conclusive, may be found in 
the mean change of +. 043 for 190 ‘‘marked’’ r’s, 
as against +. 019 for the 152 r’s in which the de- 
grees of emphasis were dissimilar. The dif- 
ference of . 024 between these means was not 
significant, however (critical ratio—1\. 50). 6 

The data for the high school group were like- 
wise only suggestive. Thus the mean change 
for the ‘‘marked’’ r’s was greater than that for 
either the ‘‘some’”’ or ‘‘little’’ r’s, but the dif- 
ferences between these mean values did not 
approach an acceptable standard of significance. 

In short, there was evidence that this hypoth- 
esis is supported by certain aspects of the data 
but not by other aspects. Qualifications had to 
be introduced with reference to the nature of 
the outcomes being correlated; the data did not 
permit a blanket type of generalization to be 
made. Support came mainly from the grouping 
by subject field but not from the groupings by 
mental process or degree of emphasis. It, 
therefore, seems that interrelationships are 
more likely to be enhanced when objectives are 
in the same or similar subject fields. On the 
other hand, interrelationships are much less 
likely to be enhanced between objectives which 
involve similar mental processes but which ap- 
ply to different subject fields, or between dis- 
Similar objectives in different subject fields 
which have received marked degree of emphasis. 





6. This standard error of the difference between the means and the standard er- 
rore in the following peragrarh were comruted from the formula for inderend- 


ent or uncorrelated grours. 


The reeulting probabilities are thus conserva- 


tive estimates of the likelihood of getting differences as large as these. 
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TABLE V 


DISTRIBUTIVE STATISTICS FOR CHANGES IN CORRELATIONS BETWEEN TESTS 
OF OBJECTIVES WHICH HAD RECEIVED THE SAME RELATIVE DEGREE OF 
EMPHASIS IN THE EDUCATIONAL PROGRAM 
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Implications for education 





These findings have important implications 
for education, especially since so little is known 
about the effects of different ways of organizing 
learning experiences upon the organization of 
learning outcomes. One implication is that it 
is apparently easier to bring about interrela- 
tionship among outcomes within a subject field 
than among similar outcomes in different sub- 
ject fields if learning experiences are organized 
and presented by subject field. Another impli- 
cation is that even though considerable emphasis 
is placed upon the development of integration 
among learning outcomes both within a subject 
field and among different fields, influences of 
one kind or another may operate to limit the 
extent of such integration actually occurring. 

The influences which may operate to facili- 
tate or to limit the development of organization 
are varied. In the first place, the complexity 
of the outcomes may be so great that the intel- 
lectual versatility required for a high level of 
consistent performance throughout a wide range 
of outcomes may be rare among the student pop- 
ulation. Then too, students have particular ac- 
ademic preferences which contribute toward 
greater cultivation of some objectives than of 
others. Differences in various specialized scho- 
lastic aptitudes may operate similarly to en- 
gender specialization on the part of students. 
Perhaps one of the most important influences is 
that of content: mental processes, though seem- 
ingly quite similar in different fields, may be 
so dependent upon the nature of the particular 
content that they actually differ greatly from 
field to field. The very specialization of the 
subject matter may preclude the likelihood of 
generalization of an intellectual skill from one 
field to a radically different one. Of course, 
instruction may have been ineffective in bring- 
ing about organization, especially since courses 
of study usually follow subject matter divisions. 
Thus, students may not have developed skill in 
noting similarities or differences between prob- 
lem situations in different fields and in modify- 
ing or adapting procedures used in one field to 
attack similar problems in other fields. 

It is possible also that the techniques of ap- 
praisal were not able to reveal the extent of 
organization of these outcomes in these sam- 
ples of students. As suggested already, the 
content in many of the tests may have been too 
specialized to yield any more intercorrelation 
than that revealed. Perhaps a more general, 
less parochial content is required in tests of 
this kind. An additional influence in this con- 
nection is the range of talent in the groups 
studied. Possibly a much greater degree of 
interrelationship and organization would be in- 
dicated if the range of talent were not so great- 
ly restricted as in these groups of highly sel- 
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ected students. 


Recapitulation of assumptions 





The major assumptions underlying this in- 
vestigation were the following: 


1. The objectives measured in this study 
constituted a fairly comprehensive list of those 
desirable for programs of general education at 
the eleventh and twelfth grades. 

2. The tests were reliable and valid meas- 
ures of these objectives. 

3. It was possible to quantify course emph- 
ases fairly accurately in three categories. 

4. Changes in various aspects of achieve- 
ment during this period could be attributed 
primarily to course experiences rather than to 
informal experiences outside the school. 

5. The extent of intercorrelation was a valid 
index of the extent to which learning had become 
organized. 

6. The statistical analyses reflected a trust- 
worthy picture of underlying relationships 
among the measures, although some individual 
items of data or differences between such items 
did not satisfy rigorous standards of statistical 
significance. 


SUMMARY 


This is the first of two articles dealing with 
the relationship between a particular pattern 


of learning experiences and the organization of 


selected outcomes of general education. Org- 
anization of learning was defined in terms of 
the degree of intercorrelation among tests of 
the various outcomes. 

Coefficients of intercorrelation among twen- 
ty-seven achievement tests were obtained early 
in the eleventh grade and again at the close of 
the twelfth on two groups of students. One 
group had pursued a program of studies ina 
conventional high school curriculum, while the 
other had completed the first two years of the 
four-year College program at the University of 
Chicago. 

This article describes the background of the 
investigation, the selection of the two groups 
of students, appraisal of their general educa- 
tion in terms of the measured objectives, hy- 
pothesis tested, statistical considerations, and 
testing of the hypothesis. 

Analysis of data was confined to a study of 
changes in coefficients of correlation for three 
groupings of tests: subject field, mental pro- 
cesses, and relative degree of emphasis of ob- 
jective. 

The second article of this series will pre- 
sent comparative data on the factor patterns 
as they existed before and after the two-year 
period of general education. 
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THE ACADEMIC OVERACHIEVER : 
STEREOTYPED ASPECTS 


ROBERT COBB MYERS 
Educational Testing Service 
Princeton, New Jersey 


Incidental to a study being made for the Col- 
lege Entrance Examination Board comparing 
interests and attitudes of overachieving and 
underachieving college students, we have come 
upon some very interesting evidence of the ex- 
istence of a well-structured stereotype concern- 
ing the academic overachiever. It will be the 
purpose of this article to present this evidence 
and the components of the stereotype. First 
we will describe briefly the locus of the study 
and the data from which our conclusions are 
drawn. The study as a whole will be reported 
upon at a later time. 

The study so far has been focussed upon an 
Eastern women’s liberal arts college which is 
part of a state university. At this college we 
administered an attitude-interest questionnaire 
to members of the Class of 1951 as a group 
just after their admission as freshmen. Later, 
the questionnaire was administered, individually 
at the college, to applicants for the Class of 
1952 along with the college’s usual application 
forms. Questionnaires received from women 
not subsequently admitted as freshmen were 
discarded. It is important to bear in mind that 
members of the Class of 1951 answered the 
questionnaire after they had been admitted to 
college, whereas members of the Class of 19- 
52 answered the same questionnaire before 
they were admitted, and before they knew 
whether or not they would be admitted. In the 
ensuing discussion we will refer to the Class 
of 1951 as postadmission cases, and to the 
Class of 1952 as preadmission cases. The total 
number of cases were: 355 postadmission, and 
362 preadmission. 

The questionnaire comprised 152 items, of 
which four served to identify the respondent, 
and the remainder were attitude-interest ques- 
tions of the ‘‘cafeteria’’ or multiple-choice in- 
tensity scale answer type: 29 four-point scales, 
91 + See 27 seven-point, and one open- 
end. 











It was desired to find if the responses of 
overachievers differed in any significant re- 
spect from those of underachievers, and it 
therefore became necessary to prepare an 
achievement index for each respondent. An 
academic overachiever may be defined loosely 
as a Student whose course grades or marks ex- 
ceed those of other students having the same 
basic ability or aptitude. Operationally, in the 
present instance, basic ability was measured 
by the verbal and mathematical sections of the 
Scholastic Aptitude Test of the College Entrance 
Examination Board, and course grades were 
represented by freshman-year grade-point- 
average. As in the case of the attitude-interest 
questionnaire, the aptitude tests had been ad- 
ministered to the students prior to entering up- 
on their freshman course work. 

After the freshman-year grade-point-aver- 
ages had been computed for the Class of 1951, 
these were correlated with the aptitude test 
scores, and a linear regression equation pre- 
pared.2 Each student was then given an ach- 
ievement index depending upon the distance 
that her grade-point-average fell above or be- 
low the line of regression of grade-point-aver- 
age on aptitude scores. Those above the line 
were, by definition, overachievers, and were 
assigned high achievement indices, while those 
below it were underachievers and received low 
achievement indices. However, in order to 
emphasize as much as possible whatever ques- 
tionnaire response differences might be found 
between these two broad groups, it was decided 
to compare only the responses of the extreme 
cases (those having the highest and lowest ach- 
ievement indices) to each other, rather than 
simply comparing all those below the regres- 
sion line with those above it. For this purpose, 
then, the 37 students having the highest achieve- 
ment indices were arbitrarily selected to rep- 
resent overachievers, and their questionnaire 
responses were matched against the responses 





. The queetionnaire was developed, preteeted and administered under the direction 


of H. 8S. 


Conrad, formerly technical consultant to the Educational Testing Ser- 


vice and currently Chief, Research and Statistical Services, U. S. Office of Ed- 


ucation. 


The aid and advice of D. G. Schultz and WW. B. Schrader, research asso- 


clates of Educational Testing Service, has been particularly helpful. 


. The verbal aptitude test correlated .44 with grade-point-average, «end the math- 


ematical aptitude test correlated .39. 
vith grade-point-average is .51. 


The multiple correlation of these two 
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TABLE I 


COMPARISON OF EXTREME OVERACHIEVERS AND EXTREME UNDER- 
ACHIEVERS ON APTITUDE TEST SCORES AND FRESHMAN-YEAR GRADES 


Vol. Xvig 


























Ability and Achievement Underachievers Overachievers 
Measures M 0 M g 
CEEB Aptitude Tests: 
Verbal 520 93.58 505 100. 50 
Mathematical 487 60.17 471 90.13 
Freshman-Year GPA 3.39 . 39 1.75 . 35 
TABLE 0 


P VALUE OF 45 ATTITUDE-INTEREST QUESTION- 
NAIRE ITEMS SHOWING RESPONSE DIFFERENCES 
BETWEEN UNDERACHIEVERS AND 











OVERACHIEVERS 
Number of | Cumulative 
p Items Total 
.01 2 2 
> .01 <.02 1 3 
> .02 <.05 3 6 
>.05 <.10 4 10 
> .10 <.20 9 19 
> .20 <.30 12 31 
> .30 =.50 14 45 
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of the 37 students having the lowest achievement 
indices. Thus, when we refer below to the re- 
sponses of overachievers and underachievers, 

it should be realized that we mean the respon- 
ses of the 37 highest overachievers and the re- 
sponses of the 37 lowest underachievers in the 
postadmission group, the Class of 1951. It 
might further be mentioned in passing that little 
difference was found in basic ability, as meas- 
ured by the College Board verbal and mathemat- 
ical aptitude tests, between these two categor- 
ies of ctudents. They are, however, strikingly 
different in freshman-year grade-point-average. 
This comparison is shown in Table I. 

Questionnaire responses of overachievers 
were matched against those of underachievers 
and the Chi-square technique applied. Forty- 
five out of the 148 attitude-interest questions 
were found to discriminate between the two 
groups, with P’s ranging from .01 up to >. 30 
<.50 as shown in Table II. Only these 45 iterss 
were selected for analysis. 

So much then for the determination of re- 
sponse differences between students who sub- 
sequently were found to be either extreme over- 
achievers or extreme underachievers. Now let 
us turn to the problem immediately at hand: is 
the substance of these overachiever-underach- 
iever differences surprising, or is it generally 
as would be expected? When one thinks about 
the expression, ‘‘academic overachiever,’’ a 
pattern of ideas, or picture, forms in one’s 
mind concerning what attributes such a person 
would have that would distinguish him from his 
polar opposite, the ‘‘academic underachiever.’’ 
This is the original social-psychological mean- 
ing of the term ‘‘academic underachiever. ’’ 
This is the original social-psychological mean- 
ing of the term ‘‘stereotype’”’ as first used by 
Lippmann, and the one to which we are here 
returning. How accurate is this picture, or 
stereotype, as compared to the differences as 
we found them ? 

A direct approach to such a problem is to 
ask a number of persons to examine the ques- 
tionnaire items and to make a judgment in each 
case, based upon the judge’s own mental image 
of the ‘‘ideal overachiever, ’’ as to how the re- 
sponse of overachievers would be expected to 
differ from that of underachievers. We used 
this direct approach, but also made use of a 
rather unusual indirect approach which will be 
explained shortly. 


The Direct Approach: Stereotype of Experts 





We called upon 25 ‘‘experts’’ (members of 
the Test Development Department of Education- 
al Testing Service), who had no knowledge what- 
ever of the actual overachiever -underachiever 
differences we had found, to indicate for each 
of the 45 selected questionnaire items the man- 
ner in which they would expect overachievers 
to respond as compared to underachievers. The 
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actual procedure was this: Each judge was 
given a blank copy of the questionnaire. He was 
asked to put a check-mark beside each of the 
45 designated items, and was then informed 
that these items appeared to us, upon statis- 
tical analysis, to be those which best discrim- 
inated between overachievers and underachiev- 
ers at the women’s college where our experi- 
mentation was being conducted. The first item 
to be judged happened to be Number 14, and 
this was used to explain how the items should 
be evaluated by the judges. This item was in 
a section having to do with reasons for going 
to college, for which five possible rated an- 
swers were as follows: 


ITEM: It is good for a girl to get away from 
home for awhile. 

POSSIBLE RATED ANSWERS: 
1. Does NOT apply or is of NO impor- 

tance 

. Of SLIGHT importance 

. Of MODERATE importance 

. Of CONSIDERABLE importance 

. Of GREAT importance 


Om Ww hw 


The judges were instructed that if they thought 
overachievers would rate this item as being of 
greater importance than underachievers as a 
reason for going to college, they should indicate 
this belief by marking the item with a plus (+); 
if, however, they believed the converse would 
be more likely, they should mark the item with 
a minus (-). The rest of the items were eval- 
uated by the judges in a similar manner: Be- 
lief that overachievers would give greater as- 
sent to, or place greater importance upon, the 
content of an item was indicated by a plus (+), 
and belief in the opposite situation was indica- 
ted by a minus (-). 

When the judgments of the 25 ‘‘experts’’ 
were tabulated, the result was found to have 
been as shown in Table III. 

It should be noted that in the case of 34 out 
of the 45 items, correct judgments were made 
by a majority of the ‘‘expert’’ judges of the 
manner in which overachievers would respond 
as compared to underachievers. This finding 
can be generalized into the statement that: 75.6 
percent (34/45) of the differentiating question- 
naire items were found to be within the ‘‘acad- 
emic overachiever stereotype’’ of our ‘‘expert’’ 
judges. The content of the items both within 
and outside the experts’ stereotype is shown 
below as part of Table IV. 











The Indirect Approach: Stereotype of Students 





We then turned to the different social situa- 
tions under which the questionnaire had been 
administered to the Class of 1951 and to the 
Class of 1952 in order to learn something about 
the stereotype held by the students themselves 
at our women’s college. It will be recalled that 
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TABLE Ill 


PERCENT OF 25 JUDGES CORRECTLY JUDGING DIRECTION OF 
THE DIFFERENCES OF RESPONSES OF OVERACHIEVERS 
FROM RESPONSES OF UNDERACHIEVERS 








ee ee ee ae ae ae ee el! 


Percent of Judges 
Making Correct Number of Cumulative 
Judgments Items Total 

100 6 6 
96 10 16 

92 5 21 

88 3 24 

84 3 27 

80 2 29 

76 1 30 

72 - 30 

68 1 31 

64 1 32 

60 - 32 

56 1 33 

52 1 34 

48 - 34 

44 1 35 

40 - 35 

36 - 35 

32 - 35 

28 1 36 

24 1 37 

20 1 38 

16 2 40 

12 - 40 

8 - 40 

4 - 40 

0 5 45 
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the members of the Class of 1951 answered the 
questionnaire after they had been safely admit- 
ted to the freshman class, whereas the Class 

of 1952 respondents were seeking admission to 
college at the time they answered the question- 
naire. That there would be distinct motivation- 
al differences between these two Classes of re- 
spondents appeared obvious, and the effect that 
these motivational differences might have upon 
responses was the subject of a good deal of con- 
jecture. 

Paul Wallin has pointed out that: ‘‘An appli- 
cant who is being interviewed for a job, or a 
prisoner who is applying for parole, realizes 
that his answers to a questionnaire or in an in- 
terview, may decisively affect his obtaining a 
job or parole. Consequently he may be under 
some pressure to give answers which he con- 
siders to be favorable. ’’3 For our purposes, 
the sense of Wallin’s statement might be gen- 
eralized as follows: The interposition of a qual- 
ifying examination for a desired goal will ord- 
inarily shift overtly expressed opinions in the 
direction deemed to be in conformance with a- 
chievement of the goal, i.e., in the direction of 
one’s stereotype of the ideal goal achiever. This 
principle is bound up intrinsically with the com- 
mon understanding that people rationalize their 
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opinions (substitute ‘‘good’’ opinions for ‘‘real’’ 
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Therefore, assuming that students admitted 
to the Class of 1952 were from the same gener- 
al population, or causal universe, as those ad- 
mitted to the Class of 1951, it was considered 
feasible to compare the average answers of the 
preadmission respondents (Class of 1952) to 
those of the postadmission respondents (Class 
of 1951) in order to learn something of the stu- 
dents’ own stereotype of the successful college 
student. This has been done, and will be re- 
ported upon fully in another paper. 

For present purposes, we are interested 
only in comparing our 45 overachiever-under- 
achiever items with the average preadmission- 
postadmission response differences. After 
making this comparison, it was found that in 
the case of 30 out of the 45 items the responses 
of preadmission respondents differed from those 
of postadmission respondents in the same man- 
ner as responses of overachievers differed 
from those of underachievers. Thus, we can 
make the statement that: 66. 7 percent (30/45) 
of the differentiating questionnaire items were 
found to be within the ‘‘successful student’’ 
stereotype of the preadmission respondents. 
The content of the items both within and outside 
the students’ stereotype is shown in Table IV. 
The proportion of items in each stereotype cat- 
egory is shown graphically in Figure 1. 

















Figure 1 


PROPORTION OF THE 45 SIGNIFICANT OVER- 
ACHIEVER ITEMS WITHIN EACH STEREOTYPE 


Within Experts’ Stereotype 


(75.64) 
7 , Outside 
) Both 
Within Both Stereotypes Stereotypes 
(57.04) (15.54) 
A ae * 








= Sig Y 
Within Students’ Stereotype 
(66.7%) 


opinions) when some specific purpose, desire 
or goal is involved. Thus, when seeking parole, 
a prisoner’s expressed opinions will be colored 
by his mental image, or stereotype, of the 
ideal successful parolee; a job applicant’s op- 
inions will be colored by his stereotype of the 
ideal successful jobholder in the occupation of 
his choice. Likewise, it would be expected 
that the expressed opinions of an applicant for 
admission to college would be colored by his 
stereotype of the ideal successful college stu- 
dent. 





Table IV is divided into four sections. The 
first section presents those overachiever item- 
attributes which were found to be within the ster- 
eotype of both experts and students; the second 
section presents those that were only within the 
experts’ stereotype; the third section presents 
those that were only within the students’ ster- 
eotype; and the fourth section contains the re- 
maining item-attributes which were outside both 
stereotypes. The table as a whole is so arrang- 
ed as to illustrate for each item: 
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TABLE IV 


I. MEASURED ATTRIBUTES OF OVERACHIEVERS WITHIN STEREOTYPES 
OF BOTH EXPERTS AND PREADMISSION RESPONDENTS 
(26 of 45 items) 


Overachiever response, and 
direction of difference from 


underachievers. 


Item content, and P value of difference 





NO 
IMPORTANCE 


> 


EE 


GREAT 
IMPORTANCE 


—— 























Reasons for Coming to College 





The atmosphere at this college is favcr- 
able to serious study. (P ».30<.50) 


This kind of college has high intellectual 
standards. (P 7.20 <.30) 


Characteristics of Girl Friends 





Friends with similar interests in one or 
more school subjects. (P >.20<.30) 


Friends as movie-going companions. 
(P >.05<.10) 


Friends who are fun to go shopping vith. 
(P >.30<.50) 


Friends who are interested in making high 
grades in school. (P >.30 <.50) 


Friends who have a good educational and 
cultural background. (P >.30 <.50) 


Friends who are generous about spending 
plenty of money for a good time. 
(P >. <.05) 


Friends who have interesting ideas on 
current affairs. (P »+.30 <.50) 


Friends who are good thinkers--independent 
and original. (P >.10<.20) 


ALWAYS Teacher Relations & Study Habits 





Have your teachers paid attention to 
| you as an individual? (P >.05<.10) 


Do you use odd moments, like time be- 





tween classes, to review what you 
have learned? (P ».10 <.20) 


Continued— 
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TABLE IV (Continued) 


ve ALMAYS 
a Teacher Relations & Study Habits (con.) 





>| Ie what you consider important in your 
7 courses the same as what your teachers 
consider important? (P >.(2 <.05) 





|= Is it easy for someone to persuade you 
” to do something else instead of study- 
ing? (P >.10 <.20) 





|< Do you study much more efficiently 
™ under the pressure of immediate neces- 
sity? (P >.01-<.02) 





| Do you lack confidence in your ability 
" to do satisfactory academic work? 
(P >.30 <.50) 








|= Have your teachers seemed to care whether the 
. students learned or not? (P >.05<.10) 





ic. If it were expected of you, would you be able 
al to maintain an average of 5 hours per day of 
study in college? (P >.30 <.50) 





|< Do you feel that the higher the grades a girl 
” gets in college the more she vill amount to 
after college? (P >.30<.50) 











|< Do you place importance on the idea that your 
. college career will train you to participate 
actively in your commmity? (P >.10<.20) 





Discussion ics 


-, Regulation of labor unions by law. (P 7.20 <.30) 





>| Strategy for strengthening the United Nations. 
: (P >.10 <.20) 





| A new night club. (P >.30<.50) 





atl Rival techniques of color-photography in motion 
pictures. (P > .20 <.30) 





>| Some needed civic improvements. (P >.05 <.10) 











>| Current developments in U. S. foreign policy. 
" (P >.10 <.20) 
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TABLE IV (Continued) 


ll. MEASURED ATTRIBUTES OF OVERACHIEVERS WITHIN STEREOTYPES 
OF EXPERTS, BUT OUTSIDE THAT OF PREADMISSION RESPONDENTS 
(8 of 45 items) 


Reasons for Coming to College 








|< It is good for a girl to get away from home 
LJ for awhile. (P >.30 <.50) 


_ Want to come to this college in order to be 
™ away from home (too far for frequent week- 
ends at home). (P >.30 <.50) 





Characteristics of Girl Friends 





| Friends who like to play cards (bridge, rummy, 
' etc.). (P  .O1) 











|= Friends who dare to be first in setting new 
- styles in dress and behavior. (P >.10 <.20) 





NOT WARMLY 





INTERESTED INTERESTED 
Discussion Topics 


| Relative merits of different “name” dance bands. 
(P >. <.05) 





|< Recent good radio jokes or "gags." (P >.20 <.30) 





i Recent fashion trends. (P >.30 <.50) 





Talk or discussion about boy friends. (P >.20 <.30) 
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TABLE IV (Continued) 


MEASURED ATTRIBUTES OF OVERACHIEVERS WITHIN STEREOTYPES 


OF PREADMISSION RESPONDENTS, BUT OUTSIDE THAT OF EXPERTS 
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(4 of 45 items) 


Reasons for Coming to College 





Nearby men's college will make possible a 
balance between work and play. (P .0O1) 


The type of girl I wish to essociate with 
socially goes to this kind of college. 
(P 7.30<.50) 


My children will be proud of the fact thet 
I went to this kind of college. (P 7.20 <.3( 


Characteristics of Girl Friends 





Friends who have a collection of popular 
phonograph records. (P >.20 <.30) 


IV. MEASURED ATTRIBUTES OF OVERACHIEVERS OUTSIDE STEREOTYPES OF BOTH EX- 


PERTS AND PREADMISSION RESPONDENTS. 











(7 of 45 items) 


Reasons for Coming to College 





College acquaintances and contacts are likely 
to prove advantageous in finding a position 
after graduation, and in locating better posi- 
tions later on. (P 7.20 <.30) 


College is a good place to meet the type of 
person I'd like to marry. (P >.30<.50) 


ALWAYS 
Teacher Relations & Study Habits 





Do you feel that much of what you are 





learning in school is wasted effort? 
(P >.20 <.30) 


After an examination, do you feel 


























that the teacher has tricked you? 
(P >.20 <.30) 


Do you read slowly? (P 7.10 <.20) 


Has most of your school work been rather 
closely related to your vocational goal? 
(P >.20 <.30) 


Do you think that people respect you more if 
they believe you don't study a great deal? 
(P »>.10 <.20) 


















































Possible terminal responses. 

Comparative length of the response con- 
tinuum. 

Mean overachiever response. 

Direction of difference of overachiever 
response from underachiever response 
(indicated by arrow). 

Item content. 

P value of the difference between respon- 
ses of overachievers and underachievers 
(parenthetically following item content). 


It should particularly be emphasized that in 
presenting the average response of overachiev- 
ers, and the direction in which this response 
differs from that of underachievers, we donot 
mean to imply that this represents actual ob- 
servable differences in activity between these 
two groups of students. It may not even rep- 
resent actual attitudinal differences, although 
we may suspect that it does. All that can be 
claimed is that, for the items presented, the 
overt pencil-and-paper responses of our over- 
achievers and underachievers appeared to be 
typically different. It is the probability of this 
difference, rather than the debatable question 
of whether or not an attitude, or ‘‘an inner 
tendency to act,’’ is being precisely measured, 
that is important to us. Observe, for example, 
the ‘‘read slowly’’ item in Section IV of Tabie 
IV. It is most unlikely that our everachievers 
actually read more slowly than the underachiev- 
ers, and, as far as we know, they may not 
even believe, however erroneously, that they 
do so. But, the important finding for us is that 
these overachievers say that they read more 
slowly than underachievers. 4 

For predictive purposes, it is the ascertain- 
ment of a typical difference in response be- 
tween potential overachiever and underachiever 
that is crucial; not the precise point on the re- 
sponse continuum where the answer may fall. 

If we find that such differences remain stable 
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from one entering class to another at our wom- 
en’s college, we may well conclude that we 
have a useful predictive device for women 
plying for entrance to that college. Whether or 
not such differences would be found to stand up 
with applicants for other women’s colleges, or 
with men, is another matter. For present pur- 
poses, however, we feel that it is useful to 
know that a core of items may be largely ster- 
eotypical, that is, easily rationalizable in 
terms of what ‘‘teacher would think best, ’’ ang 
yet apparently distinguish, and in advance of 
the fact, between academic overachievers and 
underachievers. Can it be perhaps that under- 
achievers are students who are less well ac- 
quainted with the stereotype, and overachievers 
those who are better acquainted with the ‘‘good” 
responses ? 





SUMMARY 


An attitude-interest questionnaire was ad- 
ministered just after admission to the fresh- 
man class at a women’s eastern college. At 
the end of the freshman year the highest over- 


| achievers and the lowest underachievers were 


determined by comparing College Entrance Ex- 
amination Board verbal and mathematical scho- 
lastic aptitude scores with freshmen-year 
grade-point-average. Subsequent analysis of 


| questionnaire item responses by the Chi-square 





technique showed 45 items to have differentia- 
ted between these overachievers and underach- 
ievers. The majority of these items (57.8 per- 
cent) were found to be within the ‘‘academic 
overachiever’’ stereotype of a group of expert 
judges as well as within the ‘‘successful college 
student’’ stereotype of the following year’s en- 
tering freshmen at this college. Only 15.5 per- 
cent of the analyzed items were found to be 
wholly outside the stereotypes of both experts 
and students. 








4. This finding has recently been corroborated by Everett M. Woodman in his 
study of first semester freshmen overachievers and underachievers in seven 


women'e junior colleges end two women's senior liberal arts colleges. 


See 


The Conetruction of s Measurement of Certain Non-Intellective Determinants 





of Academic Success in College, unpublished Ph.D. dissertation, Boston Uni- 





vereity, School of Education, 1949. 
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RELATIONS BETWEEN AUDITORY ABIL- 
ITIES AND READING ABILITIES: 
A PROBLEM IN PSYCHOMETRICS’ 


DOROTHEA W. F. EWERS 
University of Chicago 


INTRODUCTION 


This article reports an exploratory study 
which is an attempt to relate reading disabili- 
ties to auditory defects. It presents the hypoth- 
esis that in addition to peripheral defects of the 
nervous system, particular individuals of nor- 
mal intelligence and apparent normal hearing 
may have some central defect of the nervous 
system which prevents or impedes learning to 
read by certain of the present-day methods of 
teacuing reading. 

It also represents an attempt to show why 
evidence concerning the probable relation be- 
tween reading and other variables is often neg- 
ative or appears contradictory. 


THE PROBLEM 


The problem of the investigation is to relate 
reading abilities to auditory abilities tested by 
accoustical tests. This is to be accomplished 
by determining what correlations exist between 
a large number of auditory tests and each of 
two reading tests and then to attempt the psy- 
chological interpretation. 


THE SUBJECTS 


The experimental population consists of 140 
students of the Whiting High School of Whiting, 





Indiana. This group is approximately three- 
quarters of the experimental population tested 
by J. E. Karlin for his ‘‘ Factorial Isolation of 
the Primary auditory abilities’’ (11). Hisgroup 
consists of volunteers. The present group con- 
sists of all those members of Karlin’s group 
who did not graduate in mid-year 1941-42. 

The status of the group with regard to grade 
placement, chronological age, mental age, in- 
telligence, and reading achievement is sum- 
marized in Table I. 

The group is homogeneous with regard to 
cultural background as judged by occupation of 
the fathers who are semi-skilled workers in an 
oil refinery. It is composed about equally of 
children of foreign-born parents and children 
of American-born parents, is about half biling- 
ual and half not, has been educated almost in 
its entirety in Whiting schools—about half hav- 
ing had their elementary education in the public 
schools and half in the Catholic schools, some 
of which teach in more than one language, and 
is almost evenly divided into boys and girls. 


THE VARIABLES 


The basic criterion variables are two read- 
ing tests: the Advanced Test, Form AM, of the 
Iowa Silent Reading Tests, and the Standardized 
Oral Reading Paragraphs by William S. Gray. 
The auditory tests consist of 25 group tests ad- 
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the Whiting Public Schools, Whiting, Indiana; his rrincinals, Mr. Grubb, Mr. 
Riordan, Mr. Dougherty, and Mr. Snapp; and the teachers end students who 60 
generously contributed their time; Dr. L. L. Thuretone, director of the Thur- 
stone Laboratories of the University of Chicago for his counsel during the pro- 
cessing of the study and for some financial aid for its pursuit; Dr. F. A. 
Kingsbury, acting cheirman of the Department of Psychology, University of Chi- 
cago; and Dr. W. D. Neff and Mr. J. M. Butler, members of the staff of the de- 
partment for their advice during the prernaration of the manuscript; Dr. J. E. 


Kerlin, 


now of the staff of the Bell Telephone Laboratories for the data fur- 


nished from his study (11); and my husband, Mr. E. &. Ewers, a member of the 
staff of the Chief Engineer's Office, Chicago Area Engineering Devartment, I11l- 
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ministered by Karlin by means of disc record- braries. Original data are available from the 
ings and 14 group and 3 individual tests admin- author. 

istered by Ewers by the same means, plus the Unless otherwise indicated, each of the ay- 
Western Electric 4A Audiometer test. Brief itory tests, 1 through 26, was devised by Kar- 
descriptions of the auditory tests follow. A lin. All of these were administered by him. 
review of the literature, more complete de- Each of tests 27 through 43 was devised, re- 
scriptions of the auditory tests and descrip- corded, and administered by Ewers. Some 
tions of the reading tests, frequency distribu- are original, others are adaptations of tests 
tions of the scores obtained for Ewers’ group, or are based upon ideas for tests to be found 
and the four-fold tables of Chi-square values in the literature, particularly that concerning 
are presented in the Appendixes I-V, of Ewers’ diagnosis of reading difficulties. 

thesis on file in the University of Chicago Li- 


Auditory Discrimination for Vowels and Consonants: 
s = 90 pairs of monosyllabic or disyllabic words; 
R = judgment ‘‘same”’ or ‘different. ’’ 


Auditory Fusion Memory Span: 
s = list of 20 nonsense words of increasing length, spelled out; 
R = write each correctly. 


Haphazard Speech: 
Ss = 15 short phrases changing irregularly in pitch and loudness; 
R = write words recognized. 


Illogical Grouping: 
s = 10 short sentences with words grouped illogically; 
R = write words recognized. 


Intellective Masking: 
s = 24 items of a word or series of words masked increasingly by continuous discourse; 
R = write words recognized. 


Loudness Discrimination for Complex Sounds: 
s = 30 pairs of complex sounds; 
R = judgment which is louder. 


Loudness Discrimination for Pure Tones. Loudness test, Seashore Tests of Musical 
Talent, Series A: 

s = 50 pairs of tones; 

R = judgment which is stronger. 


Loudness Discrimination for Pure Tones of Short Impulse: 
Ss = 40 pairs of tones generated electrically; 
R = judgment ‘‘louder’’ or ‘‘softer. ’’ 


Memory for Emphasis: 


Ss = two prose passages and one verse in which certain words are emphasized correctly 
and some incorrectly; 
R = encircle in written text, each word emphasized after total presentation. 


Memory for Limericks: 
Ss = 24 five-line limericks presented on screen; 
R = write in last ine of each after total presentation. 


Memory for Male Voices: 
s = technical definitions read; 
R = indicate whether that voice has been heard previously. 


Memory for Pitch Gestalt. The tonal memory test, Seashore Tests of Musical Talent, 
Series A: 

Ss = 30 pairs of short melody sequences of varying length; 

R = indicate which note has been changed in second sequence. 
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Motor Rhythm. The rhythm test, Seashore Tests of Musical Talen, Series A: 
: = 30 pairs of patterns of tapping; 
R = judgment ‘‘same’’ or ‘‘different. ’’ 


Musical Rhythm: 


s = musical phrases produced on the piano; 

R = indicate whether played in 2-, 3-, 4-, or 8-, time. 

Pitch Discrimination for Complex Sounds: 

s = 30 pairs of complex sounds; 

R = judgment ‘‘higher’’ or ‘‘lower. ’’ 

Pitch Discrimination for Pure Tones. The pitch test, Seashore Tests of Musical Talent, 
Series A: 

s = 50 pairs of pure tones; 

R = judgment ‘‘higher’’ or ‘‘lower. ’’ 

Pitch Discrimination for Tones of Short Impulse: 

s = pairs of short impulse tones generated electrically; 

R = judgment ‘‘higher’’ or ‘‘lower. ”’ 

Pitch Discrimination for Vocal Sounds: 

s = pairs of monosyllabic sounds containing vowels and diphthongs; 
R = judgment ‘‘higher’’ or ‘‘lower. ’’ 


Quality Discrimination for Complex Tones. The timbre test, Seashore Tests of Musical 
Talent, Series A: 


s = 50 pairs of tones; 

R = judgment ‘‘same’’ or ‘‘different. ’’ 
Rapid Spelling: 

s = 20 words spelled rapidly; 

R = write word after it has been spelled. 


Sense of Time for Intervals of Silence. The time test, earlier form of Seashore Tests of 


Musical Talent: 
s = 50 items of 3 successive clicks; 
R = judgment which interval of silence is longest. 


Sense of Time for Sound-Filled Intervals. The time test, Seashore Tests of Musical Talent, 
Series A: 


8 = 50 pairs of pure tones; 

R = judgment ‘‘longer’’ or ‘‘shorter. ’’ 

Sensory Masking: 

s = passage read while buzzing noise is presented as distractor; 
R = write words missing in text while passage is being read. 
Singing: 

8 


= phrases sung to piano accompaniment; 
R = write word of phrase. 

Sound Breakdown: 

s 50 words spoken in a varying number of voices from 1 to 5; 


R = indicate the number of voices. 


The Pitch-Loudness Function: 
s = 30 pairs of tones; 
R = judgment which is louder. 


Accent (sense words): 
8 = 25 words; 
R = indicate which syllables are accented. 
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37. 


38. 


39. 


40. 


41. 
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Accent (nonsense words): 

Ss = 25 nonsense words; 

R = indicate which syllables are accented. 

Diphthongs and Digraphs: 

s = 33 diphthongs or digraphs each followed by 4 words; 

R = indicate which word contains the sound presented. 

Letter Blending: 

Ss = 23 single words ranging from two to eight separate sounds, sounded out; 

R = pronounce word after it has been sounded. 

Letier Names (mixed position, double element sounds): 

Ss = 60 words represented in pictures and 15 sounds; 

R = indicate which picture name in each group of 4 pictures contains the sound presented, 

Letter Sounds (initial position, single element): 

Ss = 132 words represented in pictures and 33 single-element letter sounds; 

R = indicate which picture name in each group of four pictures begins with the sound 
presented. 

Letter Sounds (medial position, single element): 

Ss = 128 words represented in pictures and 32 single-element sounds; 

R = indicate which picture name in each group of four pictures contains the sound 
presented. 

Letter Sounds (final position, single element): 

s = 88 words represented in pictures and 22 single-element sounds; 

R = indicate which picture name in each group of four ends in the sound presented. 

Reversals (easy vocabulary): 

Ss = 34 pairs of words pronounced separately, one member of which contains a reversal: 

R = judgment ‘‘real’’ or ‘‘unreal. ’’ 

Reversals (difficult vocabulary): 

Ss = 27 pairs of words pronounced separately, one member of which contains a reversal: 

R = judgment ‘‘real’’ or ‘‘unreal. ’’ 

Sense-Nonsense Phrases: 


s = 28 phrases some containing only real words, others containing nonsense words 
selected to be like common phrases in sound and character; 
R = judgment ‘‘real’’ or ‘‘unreal. ’’ 


Sense-Nonsense Words (easy vocabulary): 
Ss = 29 items each consisting of one real word and two similar but unreal words: 
R = judgment which is real word. 


Sense-Nonsense Words (difficult vocabulary): 


Ss = 23 pairs of words, one an English word of very difficult vocabulary level and one a 
nonsense word containing combinations of sounds not appearing in such sequence in 
English; 

R = judgment ‘‘real’’ or ‘‘unreal. ”’ 

Syllable Blending (easy vocabulary): 

Ss = 21 words, the syllables sounded out in proper sequence; 


R = pronounce word after it has been sounded. 


Syllable Blending (difficult vocabulary): 
s = 17 words the syllables sounded out in proper sequence; 
R = pronounce word after it has been sounded. 





ited. 
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42. Syllabication (sense words): 

s = 23 pairs of words; 

R = judgment which has most syllables. 
43. Syllabication (nonsense words): 

s = 33 pairs of nonsense words; 

R = judgment which has most syllables or ‘‘same. ’’ 


THE PROCEDURE 





Preparation of the Tests 


Raw scores for all the tests administered 
by Karlin were provided by him. The writer 
devised or adapted her tests and presented 
them for disc recording in her voice. In some 
instances, part of the instructions were put on 
the records. The record discs are of acetate 
on glass and were recorded by E. A. Ewers 
on apparatus which he equalized so that the 
frequency curve obtained was flat within plus 
or minus 1 1/2 decibels from 30 to 10,000 cy- 
cles. The reproducing apparatus was assem- 
bled by E. A. Ewers and the published curves 
of the loud speaker lead to the assumption that 
approximately full reproduction of the sounds 
which appear on a disc is obtained. All of the 
tests devised by Karlin were recorded on the 
same apparatus. An uncontrolled variable ex- 
ists in that the two sets of tests were not ad- 
ministered on the same reproducing apparatus. 
Neither the tests prepared by Karlin nor those 
prepared by Ewers were standardized nor item 
analyzed. 

Since disappearance of high frequencies in- 
terferes with ability to distinguish fricatives, 
in Ewers’ experiment, the original discs were 
reserved and copies were made from these for 
use in testing. A new copy was substituted 
every ten playings in order to obviate the dis- 
appearance of the high frequencies as scratch 
level rose. 


Presentation of the Tests 





All of the auditory tests were presented in 
a quiet classroom about 26 by 34 by 14 feet in 
dimensions. Chairs of the conventional school- 
room side-arm type were placed in the form 
of a triangle with the apex of the triangle im- 
mediately in front of the loud speaker. Judg- 
ment was relied upon to determine that sound 
reached each seat equally well. From 15 to 
25 subjects were tested ata time. Karlin kept 
his subjects in the same chairs throughout the 
administration of his tests, but states that he 
found no evidence of different reception con- 
ditions in various parts of the room. Subjects 





for Ewers’ tests were permitted to seat them- 
selves from day to day as they pleased. For 
the most part, they returned each testing ses- 
sion to the chairs they had originally chosen. 
The subjects came to the tests during forty- 
minute study periods. Instructions as to pro- 
cedures were given by the examiner. The sub- 
jects were instructed to guess when they were 
in doubt as to the correct answer to a question. 
An answer booklet containing pictures used 
in some of the tests and answer spaces in which 
the subject had to mark between dotted lines 
representing the correct answer or to under- 
line the word ‘‘same’’ or ‘‘different’’, etc. , 
was prepared for use with the auditory tests 
and was reproduced by planographing. The 
booklets were collected after each test admin- 
istration and were returned to the subjects at 
the next text session. A supplementary data 
booklet for the examiner’s use was also pre- 
pared which contained the list of words used 
in the letter blending and syllable blending tests, 
space for recording educational and health data, 
a speech rating, etc. A third booklet contain- 
ing standardized questionnaires regarding bi- 
lingualism and handedness and others regard- 
ing home situations, preferences, etc., and 
some drawings for motor tests was prepared 
and the questionnaires and tests were admin- 
istered. Some of the information obtained this 
way is used in the analysis presented later. 


Statistical Treatment of the Testing Results 





The Chi-square test of independence of var- 
iables was applied to each auditory variable in 
relation to each of the two reading tests. This 
technique was chosen as a quick way to deter- 
mine whether or not any relation exists between 
each of the pairs of variables. Scores above 
the median of a particular grade were put into 
one i and those below the median of that 
grade were put into another group and all 
grades were combined in these two categories 
for the purpose of computing the Chi-square 
values. The number of cases varies from dis- 
tribution to distribution. The distributions of 
test scores do not all have the same shape. 

Next, since the variables under considera- 
tion are continuous variables, the Pearson 
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product-moment correlation coefficients were 
computed and a test of significance was applied 
tothem. The relative deviate k of r in the 
normal distribution of zero mean was calcula- 
ted (Kr = rVN - 1), and the 5 percent level of 
significance, a relative value of the deviate k 
of r of 2 or more, was accepted as the demar- 
kation point of significant deviations. 


THE RESULTS 


The values of the correlation coefficients 
and of the test of significance of these values 
are presented in Table I. 


ANALYSIS AND PSYCHOLOGICAL INTER- 
PRETATION 


With few exceptions, the correlation coef- 
ficients are low. Using group techniques, one 
can expect to find a high correlation coefficient 
only when the same ‘‘causal’’ factor is affect- 
ing all of the members of the group at the same 
time. 

Realizing that it is necessary to depend up- 
on a large number of small positive correlation 
coefficients for the answer to some kinds of 
questions, it is possible to arrange the obtain- 
ed coefficients presented in Table IJ in hierar- 
chies from highest to lowest and to break the 
hierarchies to facilitate discussion. Table II 
shows the two hierarchies and the points at 
which breaks occur. 

Having thus grouped the tests, each group 
can be examined in the light of existing know- 
ledge and of logical surmises. In the follow- 
ing instance, where possible, these groupings 
are compared with those obtained by Karlin by 
means of the factorial methods of procedure. 
He applied factor analysis to a large number 
of tests and found nine factors. One of these 
did not lend itself to psychological interpreta- 
tion. Of the remaining eight, four appeared 
to be true auditory abilities and four appeared 
to be central processes which take place ‘‘ir- 
respective of sense modality, ’’ according to 
his analysis. Tests correlated with the Iowa 
Reading Test are considered first and then 
those correlated with the Gray test. For the 
reader’s convenience, an abstract of Karlin’s 
rotated factorial matrix showing factor loadings 
of 20 and above is presented in Table IV. 


17. Pitch Discrimination for r kr 
Tones of Short Impulse .70 8.19 

With this test, is secured the highest correl- 
ation coefficient obtained between any of the 
auditory tests and the Iowa reading test. No 
other pitch test stands near it. Were pitch the 
outstanding factor operating here, it is to be 
expected that other pitch tests would appear 
since nine tests define Karlin’s Factor FI, Fre- 
quency Integration. 
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Of those nine tests, test 17 (see above), 18 
Pitch Discrimination for Vocal Sounds, and 20 
Rapid Spelling account for the second, third, 
and fourth highest loadings on the factor. Test 
18 correlates .19, and test 20 correlates . 30 
with the reading test. Both of the coefficients 
are significant at the 5 percent level. The re- 
maining correlation coefficients obtained be- 
tween the auditory tests defining the Frequency 
Integration factor and the Iowa reading tests 
are .08, .13, .04, .02, .11, and . 13 for tests 
numbered 19, 13, 16, 22, 15, and 26 respec- 
tively. None of these are significant at the 5 
percent level. 

The most obvious interpretation is that the 
time element rather than the pitch element is 
the important one. This is borne out by the 
fact that tests 18 and 20, in addition to test 17, 
involve stimuli of quite short duration when 
compared to other tests defining the factor. It 
is entirely reasonable to assume that inability 
to detect very short stimuli may interfere in 
the process of learning to read—particularly 
when the method of teaching is that of phonics, 
for many of the sounds of the symbols used in 
written language are extremely short as well 
as close in pitch; for example: short vowel 
sounds such as are contained in one-syllable 
words (‘‘ih’’ and ‘‘eh’’), or fricative sounds 
such as ‘‘f’’ and ‘‘th.’’ Evidence, based on per- 
sonal observation, that shortness does interfere 
in the case of some individuals may be drawn 
from the following report: Among the remedial 
reading cases taught by the author was a boy 
who could not distinguish between the short 
vowel sounds. Not until they were long drawn 
out in time was he able to hear and finally to 
reproduce the different sounds. In addition, 
anyone who has seen the writings of an educated 
person deaf from birth, knows that such kinds 
of words as conjunctions, for example, are us- 
ually omitted. These are the words customar- 
ily slurred in speech and certainly not ‘‘mouth- 
ed’’ as are other words—a factor of detriment- 
al influence to the lip reader and probably also 
to individuals not classified as deaf but lacking 
some one or more of the auditory abilities. 

Perhaps more evidence that the time elem- 
ent is the important one here is in the fact that 
while the correlation between the Pitch Discrim- 
ination for Tones of Short Impulse test and the 
Iowa test is of the degree of .70, the correla- 
tion between the auditory test and the Gray read- 
ing test is only .26. The correlation between 
the two reading tests is .55. While the Iowa 
test is intended to be essentially a power test, 
the fact remains that it is timed throughout. In 
the case of the Gray test, however, the person 
is permitted to read at his own rate. In this 
experiment, the knowledge that he was being 
timed at all was kept from him. The ability to 
detect and to identify stimuli of short duration 
is probably a central one. The whole problem 
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may be part of the physiological problem of re- 
action time and perhaps the psychological prob- 
lem of attention. 


GROUP I 
40. Syllable Blending r Kr 
(easy vocabulary) .60 7.07 
41. Syllable Blending 
(difficult vocabulary) .50 5.90 

Test 41 is a continuation of test 40. One as- 
pect of these tests to be noted is that they differ 
from the other tests with the exception of 30 and 
the Gray reading test, in requiring the subject 
to reproduce the sounds. The test is in this re- 
spect less a ‘‘pure’’ test than the others. 

It may be that the ability to blend syllables 
is an artificial one dependent upon learning. 
Since none of Karlin’s tests stand with these 
tests, it is not possible to approach interpre- 
tation through his factors. Perhaps some idea 
of the character of the abilities required may 
be obtained from a table of Chi-squares not re- 
produced in this thesis. The Chi-square values 
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The fact that the chances are great that 
there is a relation between bilingualism and the 
ability to blend syllables into words indicates 
that this ability is a learned one probably not 
well established in individuals who have a bi- 
lingual background. The Chi-square table is 
reproduced below and shows that the relation 
exists in that direction: 


Ability to blend syllables 
+ 


Non-bilingual 19 39 
(27) (31) 


Bilingual 46 36 82 
(30) (52) 
65 75 = =140 


The idea of a non-well established habit 
seems to be confirmed by the indication of a 
relation between the word meaning subtests and 


the paragraph comprehension and use of index 
subtests of the Iowa reading test, and the syl- 
lable blending test. Certainly, understanding 


between each test administered and every other 
test were computed, and those of the value at 
or above the 5 percent level of significance 


taken from Fisher’s table were charted. The 
information shown on the chart for these tests 
is as follows: 





the vocabulary is not the common element in 
the two situations since twenty-one words in 
test 40 were taken from a spelling book pre- 


SYLLABLE BLENDING AND x240 x241 
Bilingualism! 17. 88 
Iowa subtests 
Word Meaning (Mathematics) .14 
Word Meaning (English) .31 
Paragraph Comprehension (Central 
Idea) . 49 
Paragraph Comprehension (Development) . 48 
Use of Index .14 
Iowa Percentile Rank .72 
Gray Oral-Errors . 42 
Gray Oral-Time . 23 
. Ilogical Grouping . 78 
. Selectiveness ee 
. Seashore Loudness 
. Short Impulse Loudness 
10. Limericks 
18. Vocal Pitch 
20. Rapid Spelling 
21. Seashore Unfilled Time indeed 
23. Impure Tone Masking . 09 
29. Matching . 68 
30. Letter Blending . 67 
32. Discrimination (initial sounds) ane 
34. Discrimination (final sounds) .51 
35. Reversals (easy vocabulary) . 67 
36. Reversals (difficult vocabulary) . 67 
Non-language I. Q. . 05 


SPP RAN SO: 


= 





Used by per- 
mission of the publiehers. See Moses N. H. Hoffman, The Measurement of Bi- 
lingual Beckground (New York: Bureau of Publications, Teachers College, Col- 
umbia Univereity, 1925). 


. Measured by Ewere' adantation of the Hoffman Bilingual Schedule. 
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CORRELATION COEFFICIENTS FROM TABLE II ARRANGED IN HIER- 
ARCHICAL FORM FOR THE IOWA READING TEST (SILENT) AND 
THE GRAY READING TEST (ORAL) 


Breaks in hierarchy indicated by lines 
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Auditory Test r with Auditory Test r with 
Number Iowa Number Gray 
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17 | .70 | 36 .72 
82 40 | . 60 a 12 . 59 
41 . 50 | 40 . 48 
40 10 . 45 27 47 
29 . 40 41 . 46 
29 . 40 41 
pit 23 [= 39 28 45 
of a 27 . 36 37 . 44 
Sts and 37 .35 10 . 42 
index 33 33 2 42 
syl- 20 . 30 23 41 
nding 4 . 30 5 . 40 
it in 34 . 30 29 . 38 
in 31 . 28 33 . 36 
re- 5 . 26 34 . 36 
28 . 26 31 T 32 
7 23 9 31 
42 28 32 31 
50 23 43 | 28 
35 .21 17 26 
6 | .20 20 26 
8 .20 4 26 
43 .20 7 25 
18 .19 42 24 
32 .19 16 20 
21 .18 30 .19 
14 .18 35 .19 
38 .17 14 .19 
9 | .15 | 6 .18 
26 - 13 26 .18 
13 13 a | 17 
24 12 18 | -13 
2 12 21 .12 
39 11 15 | 10 
15 . 09 19 | . 09 
3 . 09 24 | . 06 
1 . 08 25 . 06 
19 . 06 39 . 06 
25 . 04 38 . 05 
12 . 04 1 . 05 
16 03 3 . 03 
11 . 02 11 . 02 
22 0 13 0 





22 0 





36 | -. 06 
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pared for the eight grades. That vocabulary 
is not the important element operating is fur- 
ther confirmed by the fact that no relation is 
shown between test 41 and the mathematics 
word meaning and paragraph comprehension 
development subtests of the Iowa reading test. 

It is possible that the interpretation of the 
correlation between the Gray errors test and 
the syllable blending test may be quite different. 
Probably this relation is one of inability to re- 
produce blends since an incorrect pronunciation 
of a blend of letters appearing in the Gray test 
is counted as anerror. Here are two non- 
‘*pure’’ tests. 

The relation between the Gray time test and 
the syllable blending test is as easily explained. 
The syllable blending test may operate as a 
timed test for many or even all of the subjects 
since no experiment was performed prior to 
this administration to determine safe time ai- 
lowances to make the test approximate an un- 
timed one. The syllables may have been too 
closely or too widely spaced in time. The per- 
iod allotted for making a response may have 
been too short. The poor readers took more 
time to read the oral material than did the good 
readers; and also, the poor readers did not 
blend as many syllables as the good readers 
did. 

Of the remaining information charted, that 
of greatest psychological interest concerns the 
limericks test, 10,and the impure tone mask- 
ing test, 23. These two tests contribute the 








two highest loadings on Karlin’s Factor IC, In- 


cidental Closure. The limericks test requires 
the subject to write in the last line of a number 
of five-line limericks after all of them have 
been presented on a screen. The impure tone 
masking test requires the subject to write in 
words missing from a written passage while 
the passage is read in its entirety but with a 
continuous buzzing noise in the background in- 
creasing in intensity as the test proceeds. This 
task requires ability to pick particular sounds 
out of a framework of sounds, and it is likely 
that the same ability is required in the syllable 
blending test. It is conceivable that for some 
subjects the situation of having to pick out 
sounds presented within a framework of other 
sounds is analogous to that of the child who hes- 
itated over a word and then exploded when the 
teacher pronounced it for him: ‘‘I know that 
word, but this one is different; it has these 
little marks around it! ’’2 However, all three 
tests—syllable blending, impure tone masking, 
and limericks require that individuals bring 
sounds together across space. It is this fact 
that is of greatest interest to the problem of 
this study. That these two ‘‘groups’’ appar- 
ently exist in this population: one group which 
does not know what it is to listen for, and one 





o 
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| mediately or after some delay. This is, 


| by others. 
above any background in a particular subject. 


Vol. Xvq 


group which cannot bring together that which ; 
hears; seems to corroborate and further elap- 
orate Monroe’s (18) findings that two groups 
exist in the reading defect group. 


GROUP II r kr 
10. Memory for Limericks . 45 5. 22 
The artificiality of the breaking ‘‘system” 
appears here. This test probably belongs with 
those in Group II. Memory for Limericks ap- 
pears on two of Karlin’s factors. It contributes 
a loading of -9 to Factor AR, Auditory kesis- 
tance and one of 51 to Factor IC, Incidental 
Closure. The fact that two memory tests ap- 
pear on the latter factor with Memory for Lim- 
ericks suggests that memory is an element in 
it, but immediate memory is not often required 
in the Iowa test. Intelligence also appears on 
the factor. However, it has been reasonably 
controlled in this experiment. Karlin believes 
that all of the tests on Factor IC require an 
ability to reproduce crucial stimuli, either im- 
closure effect transcending sense modality." 
Certain portions of the Iowa test are analogous 
to that situation: for example, those parts in 
which the subject is required to read a passage 
and later to pick out the proper answer to ques- 
tions about the passage when those answers 
have been given in the passage. But it is pos- 
sible to go beyond that. In doing the subtests 


| of the Iowa test, the person who recognizes 


quickly when he has found the answer operates 
at a distinct advantage over other persons. 
Ability to do so may be related to ability to 
scan in reading which emerges easily in some 
people and is acquired with difficulty, if at all, 
Scanning may be a talent over and 


GROUP IV 
29. Diphthongs and 
Digraphs . 40 4.64 
Sensory Masking . 39 4. 49 
Here again the ‘‘breaking system’’ used 
seems artificial. Had a better system for 
grouping the tests been used, it is likely that 
the tests in Group IJ, II, and IV would appear 
together. The sensory masking test requires 
that a particular sound be heard when it is with- 
in a framework of other sounds. The diphthongs 
and digraphs test requires essentially the same 
ability: the sound of a diphthong or digraph is 
presented and then several words are pronoun- 
ced, one of which contains the same diphthong 
or digraph, and the subject marks the space 
that stands for the word containing the same 
sound as the one originally presented. The 
subject who can give the instantaneous response: 
‘‘There that’s it’’ and mark his paper immed- 


r ky 


23. 


His first exrerience with quotation marks. 








































oa iately has an advantage over the one who must 
‘Oups try to hold all the sounds in mind and make 
mental comparisons before deciding. The fact 
that other tests which require the comparison 
k of sounds do not appear here need not be cause 
5 22 for concern, but rather lends support to the 
tem” idea that what is important in this situation is 
5 with ability to ‘‘close’’—not ability to recognize 
is om that two sounds are the same. Karlin calls : 
tribe attention to the fact that Factor IC, Incidental 
esi *S@ closure correlates highly with Factor SC, Speed 
S* B «Closure. However, it is the ability to do the 
wal task that is important in the situation circum- 
op scribed by this experiment; for the person who 
: Lim- § «1, do the task easily, operates at such a dis- 
Pmt in tinct advantage over the person who cannot that 
a time is a negligible element. 
Aoy i croup v r kr 
reves Hor Accent (sense words) . 36 4.16 
- 37. Sense-Nonsense 
=> Phrases 35 064.11 
1 33. Letter Sounds (medial 
y: position, single element) .33 3.84 
'60US B20. Rapid Spelling .30 3.39 
fom 4. logical Grouping .30 3.44 
SSage M34 Letter Sounds (final 
_— position, single element) .30 3.49 
es : 31. Letter Names (mixed 
a position, double element) .28 3.26 
_ 5. Intellective Masking .26 2.99 
<s 28. Accent (nonsense words) .26 3.01 





Analysis of the tests in this group reveals 
that a complexity of abilities is operating. All 
of the tests appearing here which were rotated 
by Karlin—20, 4, and 5, appear on Factor AR, 
Auditory Resistance; two—20, and 5 appear on 
Factor SC, Speed of Closure; one, Rapid Spel- 
ling 20, appears on Factor FI, Frequency Inte- 
gration, while Illogical Grouping 4, appears on 
Factor AI, Auditory Integral for Perceptual 
Mass. It is reasonable to suppose that the other 
tests which appear here contain one or all of the 
elements of the tests which were rotated. How- 
ever, analysis of the total group with respect to 
the correlations with the reading test leads one 
to believe that the element in common is famil- 
iarity with the pattern of sound of the American 
language.3 A sheer rote knowledge of words 
relatively commonly spoken in that language 
should greatly facilitate accomplishment of the 
tasks set in these auditory tests and to a lesser 
extent those in the reading test. All of these 
tests are so constructed that no ‘‘understanding’’ 
of the concepts contained in the language is re- 
quired in order to do them. A foreign-born 
person with no knowledge cf the American lang- 
uage should be able to do the accent tests pro~: 
vided that the instructions concerning what he 
is to look for and how he is to respond are given 
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to him in his own language. A person knowing 
only the names of objects and not the written 
symbols for those names can do the letter sound 
tests and the sounds of letter names test since 
these are administered by means of pictures. 
And, conceivably, a person who has been in an 
environment of American language for awhile, 
even though he could neither read nor speak it, 
can do the sense-nonsense phrases test on the 
basis of whether or not he has heard those par- 
ticular combinations of sounds (prite n bark)4 
much as some people who neither speak nor 
read French may be able to tell whether a word 
is French or not, and possibly even place it as 
a German word or a Swedish word or a Welsh 
word without being familiar with other than the 
sounds of those languages wither. 

Ability to encompass whole patterns of 
sound combinations such as belong to the var- 
ious languages of the world is an ability that is 
certainly largely acquired. It is of definite 
advantage in learning a particular language to 
attain a familiarity with what might be called 
the ‘‘melodic pattern’’ of that language, pro- 
vided that two or more languages do not inter- 
fere with each other. When habits of thinking 
in terms of the sounds of any one language are 
not definitely established, then the presence of 
knowledge of another language is detrimental 
both in the ‘‘reading’’ and the ‘‘reproducing’’ 
situation. Whether or not ability to learn pat- 
terns of sounds is a native ability possessed by 
some individuals such as those who become 
linguists, and not by others, as is musical abil- 
ity so far as is known, cannot be stated now. 
The situation should be explored further; but 
before that can be done, better teSts must be 
constructed. From the standpoint of discrim- 
ination, none of the tests in this group is a 
good measuring instrument as can be seen from 
the following histograms in Figure 1. Among 
the tests represented, only tests 4 and 5 dis- 
criminate all along the length of the continuum; 
and the frequency distribution of test 4is almost 
rectangular, while test 5 does not discriminate 
well enough at either end. 


GROUP VI r ky 
7. Loudness Discrimination 
for Pure Tones .23 2.71 
42. Syllabication (sense 
words) .23 2.70 
30. Letter Blending . 23 2.71 
35. Reversals (easy 
vocabulary) ome 2. 46 
6. Loudness Discrimination 
for Complex Tones .20 2.35 
8. Loudness Discrimination 
for pure Tones of Short 
Impulse .20 2.35 


















4. Light and dark. 


2, American usage and pronunciation of English. 




















Figure 1 
Histograms of the distributions of frequencies of scores of t« 
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Group V, indicating lack of discrimination in these tests. 
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Figure 1 (Continued) 





254 JOURNAL OF EXPERIMENTAL EDUCATION 


Vol. xXvq 


been stated lends support to the theorizing tha 
shortness of stimuli in time is the important 
factor in this reading situation rather than pit- 
ch discriminatory ability. Stated differently, 
perhaps it is possible to say that fewer people 
.19 . will lack the ability to distinguish between the 
different frequencies of sounds which appear 
in the American language than will lack the 
ability to hear a stimulus of short duration or 
hear the difference between two pitches when 
both are sounded very quickly. It is entirely 
possible that the low correlations are explain- 
ed by poor tests. 

Three of the seven tests which appear on 
Factor L, Loudness, appear here also: 26, $3, 
and 13. 

Tests 9, 1, and 25 appear on Factor Al, 
Auditory Integral for Perceptual Mass. 

Test 2 appears on Factor ASF, Formation 
of Auditory Span. 


43. Syllabication (nonsense 

words) .20 
18. Pitch Discrimination 

for Vocal Sounds .19 
32. Letter Sounds (initial 

position, single element) | 
21. Sense of Time for Inter- 

vals of Silence .18 
14. Musical Rhythm .18 
38. Sense-Nonsense Words 

(easy vocabulary) oe 1.98 

The element in common to most of the tests 
which appear in this grouping seems to be that 
of loudness. A secondary element may be that 
of the ‘‘pattern ability’’ mentioned in the dis- 
cussion of Group V. 

A priori, correlations of the value 23 or less 
are hardly worth considering. However, in 
this instance the test of significance indicates 
that a relationship exists. The low degrees of 
correlation with the reading test probably indi- 


2.01 
2.09 


cate that the tests used were not well construc- 
ted. 


GROUP VII r kr 
9. Memory for Emphasis . 18 . 68 


26. 


13. 
24. 
2. 


The Pitch-Loudness 

Function .13 . 52 
Motor Rhythm -13 . $2 
Singing .12 . 35 
Auditory Fusion 

Memory Span S . 36 


. Sense-Nonsense Words 


(difficult vocabulary) -1l .28 


. Pitch Discrimination 


. Haphazard Speech 


.05 


for Complex Sounds . 09 
. 09 . 03 


. Auditory Discrimination 


for Vowels and Consonants . 08 .91 


. Quality Discrimination for 


25. 
12. 


16. 


11. 
22. 


36. 


significantly with the reading test. 


Complex Tones . 06 . 69 
Sound Breakdown . 04 . 46 
Memory for Pitch 

Gestalt . 04 . 45 
Pitch Discrimination 

for Pure Tones . 03 
Memory for Male Voices . 02 

Sense of Time for Sound- 

Filled Intervals 0 
Reversals (difficult 

vocabulary) ~. 06 0.70 
None of the tests in this group correlate 
Some infor- 


mation is to be gained, however, from the tests 


which appear here. 


Among them are included 


all of the tests which appear on Factor FI, Fre- 


B. Groupings of Tests Correlated with the 
Gray Test 


Before considering the Gray oral reading 
test, it is necessary to remember that this 
test does not measure the amount of any ability 
or abilities present; and to know that in this 
study, the scale is reversed so that a high 
score indicates few errors. Since it does not 
measure, it has little to contribute to the sol- 
ution of the problem presented here. However, 
it is possible to determine the auditory elem- 
ents necessary to correctly read orally as op- 
posed to those necessary to correctly read si- 
lently by comparing the correlation coefficients 
obtained on the silent reading test and the aud- 
itory tests with those obtained on the oral read- 
ing test and the auditory tests since systemat- 
ic variation occurs. Three-fourths of the time 
the oral reading coefficients are the same or 
lower than the silent reading ones. In twenty- 
three instances, the coefficients obtained with 
the oral test are lower or tne same. In twen- 
ty instances, the coefficients obtained with the 
oral test are higher with the exception that in 
seven cases the coefficients are so close in 
value as to be approximately the same. Those 
seven are as follows: Test 7, Loudness Dis- 
crimination for Pure Tones, Towa .23, Gray 
.25; Test 15, Pitch Discrimination for Com- 
plex Tones, .09, .10; Test 19, Quality Dis- 
crimination for Complex Tones, .06, . 09; 
Test 23, Sensory Masking, .39, . 41: Test 3 
Letter Names, .28, .32; Test 33, Letter 
Sounds, .33, .36; and Test 42, Syllabication, 
.23, .24. Those with which the correlation 
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quency Integration; tests: 19, 13, 16, 22, 15, 
and 26, with the exception of the three which 
account for the second, third, and fourth high- 
est loadings on that factor and these are the 
short impulse test 16, the vocal pitch test 18, 
and the rapid spelling test 20, which as has 


with the Gray test is higher than that with the 
Iowa test are as follows: 
r Iowa r Gray 
2. Auditory Fusion Memory 
Span .12 . 42 
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. Intellective Masking . 40 
9. Memory for Emphasis .15 .31 


12. Memory for Pitch Gestalt .04 . 55 
16. Pitch Discrimination for 

Pure Tones . 03 .20 
26. The Pitch-Loudness 

Function -13 .18 
27. Accent (sense words) . 36 . 47 
28. Accent (nonsense words) . 26 . 45 


32. Letter Sounds (initial 

position, single element) .19 31 
34. Letter Sounds (final 

position, single element) . 30 . 36 
36. Reversals (difficult 

vocabulary) -. 06 . 72 
$7. Sense-Nonsense Phrases . 35 . 44 
43. Syllabication (nonsense 

words) .20 .28 


Here can be seen the difference between the 
two reading tests. Of more importance to oral 
reading than to silent reading are memory, 
pitch, and loudness plus knowledge of the sounds 
that belong to particular symbols plus ability 
to blend sounds. Reversals affect the oral 
test (r = . 72) much more than reversals affect 
the silent test (r = -.06). A reversal in silent 
reading may not even affect the meaning of the 
context for the subject and cannot be scored as 
anerror. In the oral test, each reversal counts 
as an error. 

Perhaps here is the place to point up the 
fact that ability to read orally is a specialized 
talent. It requires no understanding of the 
meaning of the material being read. A person 
who has made the association between the sym- 
bols and sounds of a foreign language and be- 
tween those sounds and the mental and muscu- 
lar actions required to produce them can ‘‘read’’ 
that language even though he cannot translate 
it. People who know the mechanics of reading 
can ‘‘read’’ a book on celestial mechanics, few 
can understand it. Oral reading as far as 
reading is concerned requires nothing more 
than a knowledge of the mechanics—sounds 
and symbols properly associated and sounds 
placed in correct sequence. 

To illustrate the statement that has been 
made concerning pass-and-fail tests as opposed 
to measuring instruments and for whatever in- 
formation pertinent to the problem of this study 
that may be gained, analysis of the same sort 
applied to the Iowa test is made. The groupings 
are as follows: 


GROUP I ‘~% ky 
3%. Reversals (difficult 
vocabulary) . 72 8. 46 
All of 


All that can be said regarding this correla- 
tion is that the members of the experimental 
population made few reversals if they could 
read well orally. 








GROUP 0 r Kp 
12. Memory for Pitch Gestalt .55 6.20 
This test is the Tonal Memory Test in the 
Seashore Tests of Musical Talent, Series A 
and requires the subject to indicate which note 
in a series of notes has been changed when the 
two series are identical except for one note. 
On the basis of the information here, all that 
can be said is that persons who do well on the 
tonal memory test also read well orally. The 
musical series, however, may have great sim- 
ilarity to a word, the sounds within which, the 
individual reproduces mentally as an aid to 
recognition. It is conceivable that the visually- 
minded individual transfers the stimuli of the 
visual pattern directly to the association cen- 
ters of the brain while the auditorily-minded 
individual transfers the visual symbol into an 
auditory symbol before obtaining meaning from 
it. Were this true and of importance to this 
group in the reading situation, a high degree 
of correlation should appear between this aud- 
itory test and the silent reading test. However, 
the correlation is only of the degree .04. 
Nevertheless, the ability to do the task assign- 
ed in the Memory for Pitch Gestalt test may be 
an ability that is of aid in reproducing sounds. 
The Iowa test may operate as a timed test for 
auditorily-minded individuals. 


GROUP I r kr 
40. Syllable Blending (easy 
vocabulary) . 48 5.61 
27. Accent (sense words) 47 5. 44 
28. Accent (nonsense words) . 45 5.22 
37. Sense-Nonsense Phrases . 44 5.17 
10. Memory for Limericks . 42 4. 87 
2. Auditory Fusion Memory 
Span . 42 4. 80 
23. Sensory Making . 41 4.72 
5. Intellective Masking . 40 4.61 
29. Diphthongs and Digraphs . 38 4. 43 
33. Letter Sounds (medial 
position, single element) _ . 36 4.19 
34. Letter Sounds (final 
position, single element) . 36 4.19 


Analysis here leads only to the lack-of-well 
established havits problem. The supplement- 
ary data previously mentioned shows that the 
Chi-square value between bilingualism and the 
Gray errors score is not significant at the 5 
percent level. However, there are Chi-square 
values at that level of significance or above it 
between the bilingualism questionnaire results 
and the results of each of the first five tests 
listed. All that can be said is that perhaps 
retroactive inhibition operating through the bi- 
lingual background of half of the individuals in 
the group may operate to reduce the size of the 
correlation coefficients. Either, persons in 
the group with such background have overcome 
the difficulty and learned to read well in spite 
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of it, or the bilingualism affects their ability 
to read aloud well through confusion between 


the two languages. 


GROUP IV r k, 
31. Letter Names (mixed 
position, double element 


sounds) . 32 3.73 

9. Memory for Emphasis .31 3. 52 
32. Letter Sounds (initial 

position, single element) .31 3. 61 


Examining this grouping per se, all that can 
be said is that if a person does not know the 
sound that goes with a particular symbol, he 
can hardly be expected to be able to reproduce 
that sound. However, it is likely that these 
tests belong with the tests in Group I. If so, 
they lend support to the thesis that lack of well 
established habits is affecting the oral reading 
ability of the experimental population. 


GROUP V r kr 
43. Syllabication (nonsense 
words) .28 3.27 


17. Pitch Discrimination 
for Tones of Short 


Impulse . 26 3.04 
20. Rapid Spelling . 26 2.94 
4. Illogical Grouping . 26 2.98 
7. Loudness Discrimination 
for Pure Tones .25 2.92 
42. Syllabication (sense 
words) .24 3.99 


In this grouping is further confirmation of 
the idea that shortness of stimuli is of import- 
ance to the reading problem. Here are the 
tests from Karlin’s group requiring ability to 
recognize stimuli of short impulse and, here 
also are the tests of syllabication. The latter 
require the subject to identify which of two 
words in each of a series of pairs of words has 
the most syllables in it. The words are pro- 
nounced as ordinarily given in speech and range 
in length from one to six syllables. There may 
be a difference in length of several syllables be- 
tween the words of a pair or a difference of 
only one syllable consisting of a single letter. 
Only five seconds are allowed between the items 
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in order to force an instantaneous response. 
Viewed from one point, these tests are all 
speed tests, but since the Gray test is not a 
speed test, the fact that it is necessary to 
hurry to record the response in the syllabica- 
tion test does not appear to be the important 
one. Those tests should be analyzed from the 
standpoint of test construction. 


GROUP VI r kr 
16. Pitch Discrimination 

for Pure Tones .20 2. 33 
30. Letter Blending .19 2.25 
35. Reversals (easy 

vocabulary) .19 2.22 
14. Musical Rhythm .19 2.20 

6. Loudness Discrimination 

for Complex Sounds .18 2.11 
26. The Pitch-Loudness 

Function .18 2.10 


8. Loudness Discrimination 
for Tones of Short 
Impulse ae 


The elements common to the tests tn this 
group seem largely to be those of pitch and 
loudness. The size of the correlation coeffic- 
ients probably indicates that deficiencies in a- 
bility to detect pitch and loudness operate thro- 
ugh only a few members of the experimental 
population. 5 That this is true, may be borne 
out by the fact that the Western Electric 4A 
Audiometer test showed the presence of no re- 
lation to any test in the battery of tests when 
the presence of such relation was judged by the 
Chi-square method of analysis using the 5 per- 
cent level as the test of significance. The aud- 
iometer test is a crude one, but results from 
it serve to show that no cases of real deafness 
appear in the experimental population. The fact 
that it is crude, however, probably accounts 
for lack of relation between it and the tests ap- 
pearing in this group. It may be that the differ- 
ences between many of the sounds in these tests 
is greater than a difference of nine decibels be- 
yond which the 4A test is supposed not to go. 
Fletcher’s work at the Bell Telephone Labora- 
tories seems to indicate that a drop of about 30 
decibels is required before the person of norm- 





on 





. Since this study was comrleted, en article entitled "Children's Audiogramse in 
Relstion to Reading Attainment," by Sybil Henry has appeared. 
of Genetic Psychology, 1947, pp. 211l-231. 


See The Journal 
She used the Maico D 5 audiometer, 


the reading divisions of the Progressive Achievement Tests, the Gates Primary 


Reading Test and Durrell's Analysis of Reading Difficulty. 


She found that for 


the porulation under consideration, high-tone hearing loss ie one of the causes 
of reading dificiency and believes that it har been shown that high-tone loss 
should be considered phenomena “applicable to group rather than individual prog- 


nosis." 


She aleo found that in the reading situation, acute hearing for the 


high frequencies is of more importance than is acute hearing for the medium and 


low frequencies. Finally, 


ehe found a sex difference: 


"The critical*ratios of 


these differences in per cente of males and females in these extreme groups are 
reliably in favor of more girls then boys having keen hearing, and more boys 


than girls heving poor hearing." 
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al hearing finds speech difficult to understand. 

Ability to discriminate pitch and loudness, 
etc., has less to do with ability to read orally 
than have some other factors. What seems to 
be important in this experimental population is 
the degree to which the association between 
particular symbols and particular sounds has 
been established. Lack of ability to read oral- 
ly is for them, a matter of lack of training and 
experience. 

Summarizing, the results of this experiment 
seem to indicate that good ‘‘reading’’ ability 
involves: 

1. Reacting discriminately to very short 
stimuli (AI, ADI, AVI, BV). . 

2. Selecting particular sounds from within 
a framework of sounds (All, AIV). 

3. Closing the gaps between sounds (AII, 
All, AIV). 

4. Discriminating between frequencies (AI, 
All, BVI). 

5. Detecting varying degrees of loudness 
(All, AVI, BVI). 

6. Knowing the large pattern of sound of the 
American language (AIII, AV, AVI). 

7. Realizing there is the possibility that the 
person who is auditorily minded rather than 
visually minded may operate at a disadvantage 
in the oral reading situation in comparison with 
the individual who is visually minded (BI). 


CONCLUSIONS 


Before drawing conclusions from these re- 
sults, it is well to review the theories of aud- 
ition. These theories are either central or 
peripheral. Experimentai <vidence is available 
regarding ‘pitch, loudness, and timbre, andthe 
analytical power of the ear. It has been stated 
that, 

All theories of audition assume some sort 

of analytical mechanism but differ as to the 

nature of the process, its location, and the 
specific structure involved. . . . There are, 
however, objections to the central theories: 

(1) that no evidence exists for a real specif- 

icity of functioning in the auditory cortex 

such as is demonstrable in pattern vision in 
the visual cortex, (2) that no auditory corti- 
cal lesions have been discovered, correla- 
tive with the tonal gaps and tonal islands in 

persons of defective hearing, (3) that... . 

limited lesions in particular regions of the 

cochlea have been found to accompany such 

tonal gaps, pointing toward peripheral spec- 
ificity of functioning. On the other hand, all 
of the peripheral theories both resonator and 

non-resonator can be considered tenable. 6 
Therefore, it is more logical to consider abil- 
ities such as the ability to detect differences 
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in pitch or differences in loudness as peripher- 
al abilities. No such evidence exists, however, 
regarding such abilities as: (1) ability to react 
to stimuli of very short duration, or (2) ability 
to put together two sounds not occurring sim- 
ultaneously in time-——grasping the relation be- 
tween the two—a closure factor. There is 

lack of evidence, but at this stage, it seems 
logical to assume that these are central proces- 
ses. That these abilities may be related to the 
problems of reaction time and attention is a 
useful postulate. 

Of the first six ‘‘abilities’’ listed at the end 
of the last section, the first and the third may 
be considered to be central abilities; the fourth 
and fifth may be considered to be peripheral 
abilities; and the sixth as sometimes central 
and sometimes dependent upon a background of 
opportunity and practice, and the second as 
sometimes peripheral and sometimes dependent 
upon background and practice. 

Having shown the probable existence of two 
central ‘‘auditory’’ processes related to the 
problem of reading, the purpose of the experi- 
ment itself has been accomplished. 

The second purpose of the study, however, 
has been to show, if possible, why prior exper- 
iments resulted in so little interpretable or con- 
clusive evidence. Implicit and explicit through- 
out this report has been the answer to that prob- 
lem. Summarizing, it is possible to say: 

1. In an area as complex as that of reading 
or of audition, the design of the experiment 
must go beyond one element matched with an- 
other single element or a single whole. 

2. To find the answer to such a problem as 
this, it is more profitable to use measuring 
instruments rather than ‘‘pass-and-fail’’ in- 
struments for gathering the data. 

3. When measuring instruments are used, it 
is possible to apply to the data statistical tech- 
niques which are not properly applied when ap- 
plied to data obtained from ‘‘pass-and-fail’’ in- 
struments. 


Finally, while the use of the Pearson pro- 
duct-moment correlation technique and test 
group analysis is more justifiable for the pur- 
pose of studying the possible relations between 
‘‘reading’’ and ‘‘hearing’’ than are some of the 
statistical techniques necessarily applied prev- 
iously, there is available what is considered 
a still better method for the grouping of tests 
for psychological analysis. This is the mathe- 
matical procedure used as a basis for factorial 
analysis. 

The problem of hearing versus reading is 
one which lends itself to the factorial method. 
Answers closer to truth than those presented 
in this study may be obtained if a factor analy- 





6. A. G. Bills. 


General Experimental Psychology (New York: Longmans Green and 





Co., 1934). 
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sis is processed using a selected group of 
Ewers’ tests to verify and supplement a selec- 
ted group of Karlin’s tests. Additional light on 
the problem of the nature of the organic pro- 
cesses involved can be obtained if sense areas 
other than the auditory one are sampled. 


DISCUSSION 


While it is hoped that this study has contrib- 
uted to the body of psychological knowledge, it 
is understood that it merely points toward truth. 
Many criticisms of the work are pertinent. 

First, the design of the’ experiment should 
have included more than two reading tests, and 
all of the tests should have been of the type that 
measure. 

Second, the tests used should have been bet- 
ter from the standpoint of interest to the sub- 
ject and from the standpoint of test construc- 
tion as well as from that of selection. While 
from a practical standpoint, item analysis in- 
cluding alternative analysis is sometimes of 
little value in terms of the job to be accom- 
plished with a particular test, it should be ap- 
plied prior to experimental use. Also, the tests 
should be standardized and those too easy or 
too difficult eliminated. Further, investigation 
should be made to determine which tests are 
operating as timed tests and which are not. 
Ideally, time should be eliminated from this 
type of study. 

In selecting or constructing the tests, very 
particular attention should be paid to the direc- 
tions to the subject and also to the mental pro- 
cesses an individual must go through for the 
sheer recording of an answer after he has it. 
Careful analysis of the language used in direc- 
tions and items must be made so that the same 
word is not used in several different senses, 
for example, or the same concept is not given 
two different names, or many words of differ- 
ent meaning and spelling but pronounced alike 
are not used contiguously. Not: ‘‘If you need to 
write the word in order to spell it, it is alright 
tq write in the right margin of the test blank. ’’ 
Probably all of the directions should be put on 
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discs. 

Equally careful analysis must be made of the 
task, so that the subject may know for sure, for 
example, whether he is to mark his answer as 
soon as he thinks he knows the proper response 
or is to wait until all of the alternatives have 
been presented. Directions should not be couch- 
ed in literary style. They are technical mat- 
ters. The task set throughout a battery of sim- 
ilar tests should not vary: If one test requires 
the judgment that a second tone of a pair of 
tones is or is not higher than a first one, then 
the next test should not require that a judgment 
based on the first member be made. The task 
of recording a response is often a clerical one 
requiring clerical ability of a nature not poss- 
essed in equal amounts by all persons compri- 
sing a particular group. All of the tests con- 
tributed or selected should be pure tests. If 
tests are selected from those already available, 
such tests should vary in complexity of ideas as 
well as complexity of structure. 

Third, this study points to a possible relat- 
tion between the reading disability problem and 
reaction time and attention not considered here- 
tofore; and to the fact that some light may be 
shed on the reading problem if the correlations 
between verbal abilities and musical abilities 
are analyzed. These relations should be inves- 
tigated. 

Fourth, further analysis of the differences 
in the correlation coefficients on the two read- 
ing tests might be of value in the educational 
situation. 

Finally, there are some practical implica- 
tions in the study confirming Bond’s (3) sugges- 
tion that it would be a valuable procedure to seg- 
regate individuals on the basis of auditory abil- 
ities early in the educational program. Results 
of the present work indicate that this might eas- 
ily be done on the basis of auditory tests which 
would indicate which beginners could more prof- 
itably be taught to read by phonic methods than 
by sight methods or vice versa. A set of discs 
could be constructed which would not require 
excessive amounts of time to administer and 
which could be scored very simply. 





7. The American Library Association, and Gray and Leary have provided the most 


valueble information available with regard to these complexities. 


See, Pauline 





J. Fike, Margsret Egan, Helen H. Maclean, Books for Adult Beginners, compiled 
by the staff of the Reader's Bureau of the Cincinnati Public Library (Chicago: 


Wheat Makes 


American Library Association, 1935); and W. 8. Gray and B. E. yg 
a Book Readable, University of Chicago Studies in Library Science Chicago: 


University of Chicego Press, 1935). 
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MEASUREMENT OF SOME ASPECTS 
OF CRITICAL THINKING 


T. BENTLEY EDWARDS* 
Berkeley, California 


Development of the ability to do critical 
thinking is generally regarded as one of the 
most important aims of education at all levels 
and ir all areas. This ability is given a high 
place py such authoritative committees as Edu- 
cational Policies Commission, | the Harvard 
Faculty, 2 and the American Council on Educa- 
tion. 

In spite of this weight of authoritative opin- 
ion, there are those4 who flatly deny that it is 
possible to teach pupils to think and that there- 
fore, development of the ability to do critical 
thinking is not a legitimate aim of the schools. 

At the present time there is not sufficient 
evidence to settle the argument one way or the 
other. Hence, oddly enough, the teaching of 
critical thinking in the schools is established as 
an objective by means of an authoritative, dog- 
matic statement. Accepting this objective, 
there are many teachers who try hard to im- 
prove the thinking ability of their pupils. They 
are not content to assume that proof of the ac- 
quisition of knowledge is also proof that such 
knowledge will be rightly used. . Yet it is diffi- 
cult for them to point to evidence of improve- 
ment resulting from their efforts. 

At least a part of this difficulty stems from 
a lack of suitable devices to measure the inev- 
itably slow improvement in thinking ability. 
This paper describes the use of two of the new- 
er testing devices. Although these devices are 
capable of fairly wide application, they will be 
restricted in this study to content from the 
natural sciences. 


STATEMENT OF THE PROBLEM 


The problem of this study is the development 
of an instrument to measure the ability of high- 





school pupils to do critical thinking in the area 
of science. 

For the purposes of this study, critical think- 
ing is assumed to require abilities involved in 
reaching conclusions by means of facts. Al- 
though differences in meaning are not clearly 
apparent, the use of the word ‘‘critical’’ as an 
adjective is slowly gaining precedence over the 
use of such descriptive terms as ‘‘reflective, ’’ 
‘‘elaborative, ’’ ‘‘scientific, ’’ and ‘‘straight. ’’ 
Other expressions related to critical thinking 
are ‘‘understanding, ’’ ‘‘scientific method, ’’ and 
‘‘problem-solving,’’ Without going too deeply 
into the general and technical use of these ex- 
pressions it may be sufficient to point out that 
understanding must be present before there can 
be critical thinking, and that critical thinking 
is essential to both the scientific method and to 
the correct solution of problems. 


NEED FOR THE STUDY 


Use of the term, general education, seems 
to focus a growing tendency in American edu- 
cation to cut across departmental barriers, yet 
the traditional subject-matter approach re- 
mains strong. Many of the evils of the depart- 
mental approach disappear when teachers con- 
cern themselves less with their subject and 
more with how their subject may be used to 
approach goals which all pupils and teachers 
may be led to recognize as desirable. Growth 
in the ability to do critical thinking may well 
serve as one of these unifying ideals. 

The assumption that thinking skills acquired 
with the use of one kind of subject matter may 
be applied in other subject-matter areas is in 
line with modern psychological concepts re- 
garding transfer of training. 





* 2635 Hillegass ave., Berkeley 4, California. 


1. Educetional Policies Comrission. The Purposes of Education in American Dem- 


ocracy (Tashington, D.C.: Netiona 


ucation 


esociation, 38), p. ° 


2. Harverd Committee. General Education in a Free Society. (Cambridge: Harvard 





University Press, 1946), p. <67. 


3, Kerl W. Bigelow. "General Education," Review of Educational Research XVII 


(October 1947), pp. 258-265. 





4. Benjamin D. Wood and F. S. Beers. "Knowledge versus Thinking," Teachers 
College Record, XXXVII (March 1936), pp. 487-499. 





263 








264 JOURNAL OF EXPERIMENTAL EDUCATION 


According to Orata:5 ‘‘It is safe to conclude 
that from the standpoint of the teacher and the 
school in general, the solution of the problem 
of transfer of training is to train for transfer. ’’ 

There are numerous studies concerned with 
teaching critical thinking. Those by Glaser in 
the field of language-arts, § Thelen in chemis- 
try, 7 Noll in science, 8 and Fawcett in geom- 
try’ are outstanding. Each of these men con- 
cludes that it is possible to teach the various 
subjects in such a way that pupils are led to 
think critically about problems which concern 
them. These conclusions are based in part up- 
on the results of certain tests of the ability to 
do critical thinking. 

Others have made considerable progress in 
the development of techniques for measurement 
of ability to do critical thinking; Tyler10, 
Wrightstone, 11 Foust and Schorling, 12 Raths, 13 
and many others. Some of the test makers have 
confined themselves to a single area of know- 
ledge; others have tended to utilize pupil infor- 
mation of many fields. 

Most of the techniques employed, result in 
tests which are long, and difficult to adminis- 
ter, and the complexity of a test is not a good 
indication of its objectivity. It seems to be dif- 
ficult to make a thinking test with a high degree 
of reliability. One reason for the failure to ob- 
tain high reliability may arise from failure to 
eliminate the knowledge factor. 

It has been shown that realization of an im- 
portant aim of general education requires the 
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development of improved techniques to test the 
ability to do critical thinking. 


PURPOSE OF THE STUDY 


This study has for its purpose the preparation 
of instruments to measure pupil ability to think 
critically with the facts of science. Details of 
this gross design may be etched by a considera- 
tion of: content, administrability, reliability, 
validity, and interpretability. 

Content from the natural sciences was chos- 
en largely because it is easier to get agree- 
ment among the experts over the interpretation 
of scientific material. To quote Noll!4, «The 
subject-matter of science is objective, quanti- 
tative, subject to laws and principles, and 
more accurately predictable. ’’ 

A major objective in the preparation of these 
tests was to find some way of setting them up 
so that they would be easy to give and easy to 
score. In a discussion of the actual test it will 
be shown that at least this objective was reach- 
ed, for during the administration of the test, 
800 were scored by three people in a single 
day without the use of a scoring machine. In 
addition, fewer than one percent of pupils ans- 
wering these tests failed to understand the 
novel procedure in spite of the fact that they 
were supervised by teachers with no special 
training in the administration of these tests. 

Statistical treatment of pupil scores on the 
tests, including coefficient of reliability will 
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be given later, but unless reliability is built 
into a test it will not show up in the statistical 
treatment. One of the most important methods 
of building reliability into a test is to express 
the ideas involved in the words of the pupils 
themselves. Reliability is also greater when 
the test is interesting to the pupils and when 
they are able to accept its purpose. The ex- 
tent to which the new tests meet these con- 
ditions may be determined from a description 
of the manner in which the test was construc- 
ted, and from the data obtained by its use. 

To be valid, and to be capable of ready in- 
terpretation, a test must be closely tied to an 
acceptable theory of critical thinking. The 
most widely accepted theory at the present 
time is that set forth by John Dewey.15 Con- 
sideration of his famous analysis of a complete 
act of thought shows that good judgment is nec- 
essary if the problem is to be solved by intel- 
lectual processes. The thinker needs to judge 
the worth of this sources of information, the 
suitability of suggested hypotheses, the extent 
of relationships between facts and principles 
and the fitness of conclusions reached. A test 
of critical thinking that proposes to measure 
with any degree of validity should therefore be, 
at least in part, a test of pupil judgment. 

It is taken as self-evident that a test of pupil 
judgment should measure pupil judgment rather 
than pupil knowledge. It is impossible, how- 
ever, to think about nothing, hence the facts 
utilized must be either as familiar as possible 
to all the pupils, or they must be equally un- 
familiar. Thinking with unfamiliar facts is ex- 
tremely difficult and seems to require a high 
degree of intelligence. Because facts are the 
medium of thought, teachers are quite right 
when they view the transmission of the heritage 
of knowledge as a most important aspect of ed- 
ucation. But a valid test of pupil judgment 
must hold the knowledge factor as constant as 
possible. 


DEVELOPMENT OF THE TESTS 


Careful examination of Dewey’s analysis of 
a complete act of thought indicates the possibil- 
ity that a number of abilities may be involved 
in the process called critical thinking. A num- 
ber of tests were therefore written and tried 
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out with small groups of pupils. These pre- 
liminary experiments indicated that at least 
four of these part tests were worth developing 
to the point where they could be used with a 
large number of pupils. 

In this paper, results obtained with only two 
of these part tests will be presented. Descrip- 
tion of the preliminary experiments will, there- 
fore, be limited to those that concern these two 
part tests which will be called Part A and Part 
C. Copies of these two parts appear at the end 
of this article. 


Development of Part A 





This test was suggested by the work of 
Smith. 16 It consists of six groups of four gen- 
eral principles of science labelled A B C D, 
followed by five statements of facts. After 
each statement is a blank space for the letter 
of the corresponding principle. Each principle 
may be used more than once. Reference to the 
copy of the test will make the simple procedure 
clear. 

In the first edition of this test, facts were 
chosen which were unfamiliar to the pupils. 
This arrangement proved unsatisfactory for 
two reasons: it takes high intelligence to think 
with unfamiliar facts, and the pupil who knows 
more science has an advantage. These unfam- 
iliar principles were taken from the list by 
Wise, 17 for secondary school pupils. 

In the second edition of the test, principles 
were used from a selection in the literature by 
Robertson, 18 for elementary school pupils. 
These principles proved quite satisfactory to 
test the inductive ability of college sophomores. 

A test of this kind is quite easy to prepare. 
Each principle is written on a card, thena 
simple fact of science is written under each 
principle which is closely associated with it. 
Principles with their accompanying facts are 
sorted into six more or less homogeneous 
groups. Four cards for each group are then 
cut with a pair of scissors so that fact and 
principle are separated. An extra fact for one 
of the principles chosen at random from each 
group is written on a fresh half card. The four 
principles are shuffled and written down for 
the test A BC D, and the corresponding facts 
are written 1, 2, 3, 4, 5. 
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Development of Part C 





A number of authors have pointed out that 
one aspect of critical thinking involves the sep- 
aration of fact from opinion. A test asking pu- 
pils to do this is therefore suggested. A ser- 
ies of statements was prepared with the letters 
F O after each statement. If the statement is 
a fact the pupil is to underline the letter F. 
The chief difficulty with this test came from 
scoring. Careful analysis of a group of so- 
called facts turned out to be merely opinions 
after all. Hence this test was discarded. Dr. 
Parker!9 pointed out that the question should 
not be, ‘‘Is this a fact or is it an opinion?’’, 
but rather, ‘‘Which is the better of these two 
opinions?’’ This statement strikes at the root 
of the matter. As suggested previously, a 
test of critical thinking should be, at least in 
part, a test of judgment. 

Previous tests of judgment have, by and 
large, measured pupil ability to distinguish 
between the two ends of a scale of values. They 
have, in the main, limited the minds of the pu- 
pils to the consideration of only two categories 
—the right and the wrong. Although this lim- 
itation is most evident in the true-false kind 
of item, it is also present in the multiple- 
choice item, in the matching item, and in all 
other kinds of test items with the exception of 
expertly worded questions of the essay type. 

It should also be born in mind that essay type 
questions, even when expertly worded, fail to 
go beyond this limitation unless they are also 
expertly graded. Hence, if modern, easy to 
score, short-answer tests are to measure 
higher mental processes than the memoriter 
reproduction of factual knowledge, they must 
no longer confine the pupil to a consideration 
of two categories only. 

Consideration of several categories has 
long been utilized in the measurement of pupil 
products such as art, composition, and hand- 
writing. In grading these pupil products the 
teacher, consciously or unconsciously, employs 
either a standard scale or a ranking device. If 
pupils are to display their good judgment in a 
consideration of several categories, they too 
must use a ranking device, and their choices 
must be so arranged that partial credit can 
easily be given. 

These, then, are the special demands on 
Part C: it must be a test of judgment, it must 
not confine the attention of the pupil to only two 
categories, it must permit the pupil to employ 
a ranking device, the knowledge factor must be 
kept as constant as possible. As before, the 
general demands are: science content, easy to 
administer and to score, valid, and with a high 
degree of reliability. 
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In order to keep the knowledge factor as 
constant as possible, a paragraph was prepared 
on a controversial topic—the common cold. As 
most pupils suffer from colds, it was expected 
that they would find the topic interesting. Facts 
to be included in the paragraph were culled 
from reports of the United States Public Health 
service. 

In order to obtain judgments on these facts 
in the words of the pupils themselves, copies 
of the paragraphs were prepared and distribu- 
ted to a group of sixty incoming college fresh- 
men. These students were asked to write an- 
swers to the following questions in terms of 
facts presented in the paragraph: 


1. Is it possible to prevent colds by vaccin- 
ation? 

2. What is the best way to treat a cold? 

3. What causes colds? 

4. What is the value of research concerning 
the common cold and who should pay for 
it? 

5. What is the use of eating vitamins to help 
prevent colds? 

6. Should a person suffering from a cold 
call in the doctor? 


Answers to each of these six questions were 
carefully examined and sorted into four cate- 
gories: 


1. A sound, careful, scientific statement in 
light of the facts given in the paragraph. 

2. A fairly good answer marred by some ir- 
relevant or incorrect detail. 

3. A totally irrelevant answer. 

4. A totally incorrect answer. 


A single answer was chosen from each of 
the four categories for each question, giving a 
list of 24 statements in all. Each statement 
was chosen arbitrarily on the basis of clarity 
of expression, length, and ease with which it 
fitted into one or other of the established cate- 
gories. The four responses to each question 
were then shuffled so that the order in which 
they appeared on the test gave no clue to the 
merits of the response. These four statements 
were labelled: ABCD. Now it is possible to 
combine these four letters into six pairs: 


1. AB 4.BC 
2. AC 5. BD 
3. AD 6. CD 


When a pupil takes the test he has merely to 
circle one of the letters from each of the six 
pairs. Reference to the test itself at the end 
of this article, especially to the directions 
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telling how to do the example will make the 
novel process quite clear. Although pupils 
were not told as much, many of them soon real- 
ized that they were ranking the four statements 
in such a way that they could be given partial 
credit. Although some of the decisions are 
harder to make than others, each correct 

choice is given the same credit, so that the 

test is scored by simply counting the number 

of correct responses. 

The set of directions given is the result of 
considerable experimentation. It can be stated 
that with the directions shown the test is prac- 
tically self-administering. Use of an answer 
sheet decreases the cost of administering and 
scoring the test. Machine scoring is entirely 
feasible. The process is flexible and could 
easily be adapted to any other testing situation 
where pupils are to rank a series of statements 
or products. 


PRELIMINARY TESTING 


These two part tests were now administered 
toa group of 200 students ranging in level from 
grade IX to grade XIV. Table I shows the num- 
ber of pupils tested in each grade, the mean 
score, standard deviation, and the range. 

This table gives certain interesting inform- 
ation. In the first place, the tests are of ap- 
proximately the right difficulty for secondary 
school pupils. Average scores for Part A show 
a steady increase for the higher grades. Aver- 
age scores for Part C show an increase for the 
higher grades, but it is not as steady. 

Outstanding is the fact that increase in mean 
scores from grade to grade is completely over- 
shadowed by range of scores within a given 
grade. A few pupils in grade [IX do much better 
than the average in grade XIV. Standard devi- 
ations indicate that the small range of possible 
scores is being used efficiently. 

Table II gives the self-correlations or coef- 
ficients of reliability obtained by a statistical 
treatment of these scores. Figures given in 
this table indicate sufficient reliability for 
group use. Item analysis of these two part 
tests indicated no unsatisfactory items. 

These tests were given with very little ex- 
planation by teachers in charge of their own 
groups. However, scoring time with answers 
written in the test-booklet was largely taken up 
with turning pages and therefore an answer 
sheet was devised. 


ADMINISTRATION OF THE TESTS 


After the satisfactory preliminary testing 
these two tests were administered to approxi- 
mately 1000 pupils in the San Francisco Bay 
Area. These pupils were of approximately 


average intelligence and were in grades X, XI, 
and XII. 
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All tests were administered by regular tea- 
chers in their own class rooms. No formal in- 
structions were given to these teachers andthe 

tests were regarded as practically self-admin- 

istering. Each part test required approximate- 
ly twenty minutes of pupil time. 


THE TEST MEASURES THE PUPILS 


Data obtained from scores made by the fair- 
ly large sample of pupils is interesting. Table 
Ill gives mean scores for the grades. Scores 
obtained on Part A show a steady increase to- 
ward grade XII. Scores obtained on Part C 
show some increase but it is not nearly so def- 
inite. 

Table IV gives critical ratios for sex differ- 
ences. A positive value indicates that boys 
do better than girls. These data give an indi- 
cation that skills measured by these tests may 
be learned. The boys do consistently better 
on Part A and the girls do consistently better 
on Part C. 

Table V gives the range of scores. The ex- 
tent of the range for all grades completely over- 
Shadows any increase in mean score toward 
grade XII. This evidence corroborates that 
obtained in the preliminary experiments. 

Some interesting data were obtained by mak- 
ing an error count on all the answer sheets. In 
each grade the number who marked an item 
incorrectly was expressed as a percentage of 
the total number of pupils in the grade who 
wrote the test. 

These percentages displayed remarkable 
regularity. Those obtained from approximate- 
ly 150 boys in high XI and 150 boys in low XI 
gave a coefficient of correlation of . 97. 

These percentages for item 10 of Part Care 
interesting. They agree with the preliminary 
data when they show that this item is the most 
difficult in the test. There is some evidence 
from the decrease in the percentages that the 
key is correct, but the evidence is not clear- 
cut. Item 10 asks pupils to distinguish between 
Ill B and C on the test. This item poses a con- 
troversy which goes on to some extent within 
the medical profession itself. 

Although the girls do better on Part C than 
the boys, they do not do better on items 22, 25, 
and 29. Item 22 (V, BC) asks the pupil to dis- 
tinguish between a tolerant and correct state- 
ment and an intolerant statement which is at 
least partly false. The key is in favor of the 
correct statement and three-fourths of the pu- 
pils agree with this decision. 

Item 25 (V, B D) calls for the same type of 
decision: A. Doctors are useless; B. Doctors 
help. The key chooses B and a majority of the 
boys agree. A majority of the girls disagree. 
They prefer to look after themselves. In the 
preliminary study, 28 out of 30 pupils scoring 
high on the test agreed with the key and 15 out 
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TABLE I 


MEANS, STANDARD DEVIATIONS, AND RANGE OF SCORES 
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TABLE IV 


CRITICAL RATIOS FOR SEX DIFFERENCES 





Low 10 High 10 Low 11 High 11 Low 12 High 12 


2.27 3.07 2. 33 2. 39 2.14 
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PartC -1.04 -2.92 1.81 -1. 76 -2.55 -. 80 
i 





TABLE V 


RANGE OF SCORES 





Low 10 | High 10 High 11 
Part A 8-25 2-28 6-29 





Part C 11-27 6-28 11-27 
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of 30 pupils scoring low on the test also agreed 
with the key. 

Answers to item 29 (V, B D) show that a 
small majority of the girls are still opposed to 
calling a doctor to treat a cold. 

At the moment many of these decisions are 
of academic interest to the pupils; later they 
may be so important as to affect their very 
lives. Schools should neglect no aspect of their 
training which helps pupils to form correct con- 
clusions. Knowledge certainly the schools 
should teach, but even of greater importance 
is the acquisition of skill in the use of this 
knowledge. 

Self-correlations or reliability coefficients 
for the two part tests are given in Table VI. 
Table VII gives the correlations with intelli- 
gence. 

Results given in Table VII were obtained by 
administering an Otis quick-scoring intelligence 
test. For all groups quoted, less than six 
months separated administration of the two 
tests. 


THE PUPILS MEASURE THE TEST 


Reliability 


According to the data, Part A is a fairly pre- 
cise instrument. When a second form of the 
test was prepared and given to a small group 
who had previously taken the first form, scores 
on the two forms showed a correlation of .'80. 


Part A is sufficiently reliable for individual 


use. Should an exceptionally reliable instru- 
ment be required the test could be lengthened. 
Part C is less reliable. With four different 
groups, values of .53, .57, .64, and .75 were 
obtained. By the two-forms method the value 
dropped to .32. In Form A of Part C all judg- 
ments called for are based on facts about the 
common cold. In, Form B of the same part the 
judgments called for are based on facts concern- 
ing safe driving. It is difficult to understand 
why a person with good judgement about colds, 
their treatment, cure, and so on, should not be 
able to show good judgement about highway 
driving. An interesting possibility is raised. 
These pupils may not be passing judgment on 
these opinions at all; they may be giving back 
answers they have previously memorized. 


Validity 


Validity, the extent to which a test meas- 
ures that which it purports to measure, is 
much more difficult to gauge than reliability. 
There are a number of clues to validity that 
may be examined. 

First of these clues concerns the relation 
of the test scores to academic level. The pri- 
mary assumption here is that a school program 
as at present constituted does advance the pu~ 
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pils to a greater or less degree toward any ob- 
jective that pupils and teachers generally rec- 
ognize as worthwhile. Abilities involved in 
eritical thinking have been established as one 
of these worthwhile objectives and evidence 
presented in this study shows that pupil scores 
on these tests do increase with academic level, 

Second of the arguments for validity con- 
cerns the relation of test scores to results of 
intelligence tests. This evidence is given in 
Table VII. Coefficients of correlation with 
score on the part test of critical thinking com- 
pared with score on the Otis quick-scoring 
test of mental measurement range from . 00to 
.17. Evidently intelligence helps to determ- 
ine success on the test of critical thinking, but 
it is not the only factor, nor is it the dominant 
factor. 

A third argument for validity is the fact that 
these tests were built deliberately about pro- 
cesses generally accepted in the theory of crit- 
ical thinking. Each item was carefully exam- 
ined to make sure that the mental processes 
needed to reach the correct answer were these 
generally-accepted thinking processes. It is 
hardly necessary to go through each part of 
the test item by item. This is an individual 
matter and must be carried out by anyone in- 
terested in doing so on an individual basis. 


Administrability 





These tests are easy to give and easy to 
score. More than seventy teachers to whom 
the test was presented understood what the pu- 
pils were expected to do and less than one per- 
cent of the pupils failed to interpret their in- 
structions correctly. If the testing techniques 
here developed are to be incorporated into a 
Standard test, it will be necessary to elaborate 
on the instructions making each item of pro- 
cedure explicit. When a teacher or adminis- 
trator is concerned only with the progress of 
his own particular group, there is less need 
for such uniformity and the tests as they stand 
may be regarded as self-administering. High 
school pupils are able to complete each part of 
the test of 30 items in fifteen or twenty minutes. 
Most of the pupils find the test interesting and 
are willing to work at it even under fairly ad- 
verse conditions. It is possible to prepare two 
forms of each test of almost equal difficulty. 
Only a single sheet need be used by each pu- 
pil taking two part tests, for repeated use of 
the booklets is possible. 

Scoring and tabulation of scores is facilita- 
ted by use of answer sheets. Machine scoring 
is entirely feasible though not used during the 
present study. There are no doubtful answers, 
at least not on the basis of returns from the 
groups to whom the test was given. Study of 
answers to individual items is also made easy 
by use of the answer sheets. 
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SUMMARY AND CONCLUSIONS 


Development of the ability to do critical 
thinking has been established as an objective 
of the secondary schools. 

Reference to careful research on this topic 
has shown the need for new testing techniques, 
especially techniques that are reliable yet easy 
to administer. 

Two new techniques have been described and 
these techniques have been carefully linked to 
an established theory of critical thinking. 

Data have been presented to show that the 
two part tests have been carefully developed. 
Scores from over a thousand pupils taking these 
tests have been used to demonstrate their use- 
fulness to teachers, administrators, and re- 
search workers. 


Conclusions about the tests 





Part A is sufficiently reliable for individual 
use, and Part C for group use. It is possible 
to prepare two forms of these part tests of ap- 
proximately equal difficulty. 

These tests are valid to the extent that they 
do not measure pupils on a basis of intelligence, 
or on a basis of achievement in school. Meas- 
ure the pupils they do, however, and in a reg- 
ular manner. The tests are designed to meas- 
ure critical thinking, and should they be found 
acceptable for the purpose will assist in form- 
ulating a definition of critical thinking. 

Techniques developed render these tests 
easy to give and score. In addition, the direc- 
tions are such that they are practically self- 
administering. 

Because the tests are tied to an established 
theory of critical thinking, they give results 
easily interpreted. 

Part C employs a ranking device capable of 
application in many other testing situations. 

It has been clearly demonstrated that tests 
of critical thinking must make use of content 
familiar to the pupils. 

Pupils do not guess on these tests to any 
great extent. They tend to answer in terms of 
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a conviction. 


Conclusions regarding the pupils 





Use of these tests with a fairly large sample 
make it possible to draw a limited number of 
conclusions regarding the mental processes of 
the pupils. 

Most inevitable is the conclusion that differ- 
ences in abilities measured by these tests are 
much greater between individual pupils in the 
Same grade than between groups of boys and 
groups of girls. 

Scores made on Part A show that even in- 
ductive ability does not accompany either much 
knowledge or high intelligence. ‘Differences in 
scores made by boys and girls give some indi- 
cation that skills measured by these tests may 
be taught. 


Implications 


It appears that ability to do critical thinking 
is a valid objective of the schools in that it is 
possible to isolate techniques of critical think- 
ing and test for the acquisition of skill in the 
use of these techniques. 

On the other hand, the results of this sur- 
vey clearly demonstrate that the traditional 
role of the schools as centers of learning is 
valid. If pupils are to be taught to think they 
must first be given something to think about. 
Thinking is possible only with familiar concepts 
for most people. Pupils will not, in general, 
be able to think about material to which they 
have been merely exposed. It is obvious, how- 
ever, that if pupils are given only useless in- 
formation, they will be able to think only use- 
less thoughts. 

Because so little improvement in thinking 
ability is demonstrated by the successive grades 
in school, there is a strong implication in the 
results of this study that the experiences the 
child is led to undergo while he is in school be 
very carefully examined with a view to improv- 
ing the rate at which he acquires skill in crit- 
ical thinking. 
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PART A FORM A 

Each of the questions in this part of the test consists of a group of four principles of science fol- 
lowed by five statements of fact. In the blank for each fact place the letter of the correspcnding 
principle. Be sure to choose the principle which corresponds most closely to the given fact. Each 
principle may be used more than once or not at all. 


Example: 


A low and a high musical note travel at the same speed. 

An object has energy when it is able to do work. 

Many common diseases are caused by germs. 

Pressure developed by a liquid is proportional to the depth. 





Typhoid fever may spread through eating uncooked crabs 

When a band is heard a block or so away, the bass drum still seems to 

keep time with the piccolo. 

3. A bubble of air grows larger as it rises to the surface of a pond. 

4. A modern way of wrecking a building is to swing a huge iron ball from 
the end of a crane. 

5. Tuberculosis is a disease caused by the presence of the tubercle bacil- 

lus in the body of the patient. 


yr cope 


QO jw Op [2 


Do not write in this booklet. A blank for each fact is provided on a separate answer sheet. 
Write your name, school, and grade on the answer sheet in the spaces provided. 


Record your answers in the proper places on the answer sheet. 





ANSWER SHEET 


eS i ea le ko eee ee Oe ee ae GS ee eee 
sg eae ee ne a 6 eee a A ea ae DES + « « & 6 & « & + Bama 
0 Ee oe ae NS at er A ae a ee Se 8S SS dooce OS CARS 


PART A — Matching facts and principles. Be sure to get your answer in the proper place to 
receive credit. 





























I Il I IV Vv VI 
1 6 11 16. 21 26 
2 7 12 17. 22 27 
3 8 13 18. 23 28 
4 9 14 19. 24 29 
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PART C — Judging opinions. Note that there are six questions to each section. Section I is 
the example. 
AB 
AB 
AB 
AB 


AB 


I 


Gases must be cooled below a certain temperature in order to be liquefied. 
A gas always expands to fill any container. 

All matter may change its state by absorbing or releasing energy. 

Boiling point of a solution varies directly as the pressure. 





Liquid sulfur dioxide contained in a closed tube changes suddenly to a gas when the tempera 


is raised above a certain point. 

It takes a long time to boil an egg up in an airplane. 

When a quart of air is put into a two quart container, it fills the whole of the two quart space 
A blown up balloon becomes round like a ball. 

Water boils in a kettle on the stove. 


I 


A body immersed or floating in a liquid is buoyed up by a force equal to the weight of the fluid 


displaced. 

Pull of gravity is proportional to the mass of the body. 

Pull of gravity is inversely proportional to the square of the distance between the centers of 
mass. 

To every action there is an equal and opposite reaction. 





A body weighs slightly less at the equator than at the north pole where the earth is flattened. 
A piece of iron floats on a dish of mercury. 

Two cubic inches of lead weigh twice as much as one cubic inch of lead. 

A boy tries to jump from a row boat to a wharf but the boat moves from under him and he falls 


in the water. 
Stopping a baseball with the bare hands is apt to be painful. 


Ill 


When waves pass from a rare to a dense substance they are bent or refracted. 
Whenever an opaque object intercepts radiant energy a shadow is cast behind the object. 
When waves strike an object they may stop, pass through or be reflected. 

When the source of a sound is approaching the pitch of the sound produced is raised. 





There is a sudden drop in pitch when a moving automobile which is sounding its horn passes us 
on the highway. 

An oar dipping into the water seems to be bent. 

A girl can see her image in a mirror. 

Dew on the north side of a hill fails to dry on a spring day. 

Some people call the rear vision mirror a ‘‘loobackroscope’’. 
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IV 


A. A solution boils at a higher temperature than the pure liquid. 

B. Different substances expand different amounts when heated through the same temperature range. 
C. Dark rough surfaces give up or lose heat energy faster than light colored smooth surfaces. 

D. The higher the temperature of a body the more heat it radiates. 





A fluorescent lamp is more efficient than the ordinary kind of lamp. 

An aluminum piston will often ‘‘slap’’ in a steel cylinder when the motor is cold. 
When salt is added to the potatoes they stop boiling for a minute or two. 

Areas near the sea enjoy a temperate climate, cool in summer and mild in winter. 
After a day in the snow it feels good to get beside a red-hot stove. 


Vv 


It is possible to develop new types of plants by cross-breeding. 
B. The sun is the original source of nearly all energy used on the earth. 
C. The cell is the unit of structure and function in all plants and aniraals. 
D. Plants and animals in a given environment are mutually interdependent. 





21. Coal was made from green plants. 
Plants use carbon dioxide and sunlight to make starch and animals eat starch and breathe out 
carbon dioxide. 

23. Luther Burbank was able to produce a white blackberry. 

4. Root hairs grow near the tip of a root. Their purpose is to gather moisture and soil minerals 


for the plant. 
25. A balanced aquarium takes care of itself. There is no need to feed the fish or change the 


water. 


VI 


A. Bodies in rotation tend to fly out in a straight line. 

B. Like magnetic poles repel and unlike poles attract. 

C. Rubbing two insulated materials together causes one to take on a positive charge and the other 
a negative charge of electricity. 

D. Like electrical charges repel and unlike charges attract. 





PART C FORM A 
This is a test of your ability to judge which is the better of two opinions. Certain facts will be 
Stated which you will assume to be true, and then you will judge opinions in the light of these facts. 

All the different opinions are based on facts about the common cold. Here are the facts. Read 
then: carefully and refer to them again later if you wish. 


Colds cause more loss of working time in the United States than strikes. Another $400 mil- 
lion spent for ‘‘remedies’’ brings the annual cost in America to $2 billion. In addition, 
colds may lower resistance and pave the way for more serious disease. 


Doctors who make thirty-two percent of their calls to people who have colds tell us they 
have no specific treatment to shorten the time of the disease. 


Five years ago, Congress voted the sum of $50, 000 to the United States Public Health 
Service for research into the common cold. Through this research a virus able to cause 
colds has been isolated by washing with sterile skimmed milk the nose of a man just catch- 
ing a cold. Chicken eggs, set to hatch in an incubator were innoculated with this virus and 
kept alive. 


Fifty-seven out of sixty people, sprayed with this infected egg fluid caught cold. 
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Under the electron microscope (many times more powerful than a glass lens microscope) 
the virus appears to be distinctly different from the influenza virus discovered in 1926. 
Twenty years have failed to produce a reliable vaccine for influenza. 


Neither sulfa pills no penicillin can harm the cold virus. Effect of ultra-violet light is un- 
known. Quinine is useless since doctors believe a slight fever is a help. Cough medicines, 
nose-drops, sprays, and inhalers should be used with care. None can shorten a cold and 
some are harmful. Doctors are generally agreed that a person suffering from a cold shoul 
rest and keep warm. Hot drinks may help if they are not overdone. Doctors also agree 
that proper diet, suitable exercise, sufficient sleep and teen habits protect against colds 
by maintaining general good health. 


There are six sets of opinions. To make sure that you know what to do the first set will be 
completely worked out. Judgments are to be made in light of the facts just quoted. 


Example: 
I. The prevention of colds by vaccination: 


A. Will never be possible. Money spent on research for a cold vaccine would therefore be 
wasted. 

B. Is now regularly practised by specialists. It is a cheap, reliable method. 

C. Is a future possibility, but even though the virus has been isolated, development of a 
vaccine will probably be a long slow process. 

D. Is a matter of taking pills to prevent excess stomach acidity. 


1. (A)B 2. A(C) 3. (A) D 4. B (C) 5. B (D) 6. (C)D 





Directions for answering the example 





Read the opinions A and B and see which is better. IN THE LIGHT OF THE FACTS STATED 
IN THE PARAGRAPH ABOUT COLDS, A is better so circle the letter A opposite 1. 


1. (A)B 

Read opinions A and C. Which is better? C is better, so circle C. 
2. A(C) 

Continue with the remaining pairs. 


3. (A)D 4. B(C) 5. B (D) 6. (C)D 


You note that some of the choices are much more difficult to make than others. Perhaps you 
do not agree with the ones chosen. That is all right. Mark your choice from each pair. 


Make no marks in this booklet. Write your name, school, and grade on the answer sheet in the 
spaces provided. Record your answers in the proper places on the answer sheet. 


Il. Facts stated about colds indicate that in order to treat a cold intelligently you should, at the 
first sign of a cold: 


A. Keep warm, use a minimum of medicines, and rest as much as possible. Calling ina 
doctor is a waste of his time and your money since there is no available treatment to 
shorten the duration of a cold. 


. Do some hard work or vigorous exercise so as to sweat it out of the system. Eat heartily 
so that your body will have strength to fight the disease. Take a purge to clear the bowels 


. Drink plenty of liquids, avoiding solids in the diet as much as possible. Drink something 
hot before going to bed So as to sweat it out. 
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D. Go to bed and stay there until the symptoms are gone. Have your doctor check you over 
and let him prescribe any medicine necessary for relief. Eat a suitable diet. 


-Ope 
6. 









[l. Here are four opinions regarding the cause of colds. Please compare them as before. 








is un- 
licines, 
| and 

| Should 
ree 

it colds 





A. Colds are caused by bacteria which scientists have been able to study with the aid of a 
microscope. Various antiseptics are available which can kill these bacteria. It is the 
people who do not know enough to use proper gargles and sprays that lay themselves 
open to a cold. 







B. Scientists of the United States Public Health Service have proved that one cause of a cold 
is a virus which they were able to isolate and keep alive. We cannot be sure of anything 
beyond this fact. 






be 





. Colds are produced when a virus from another person enters a body which, because of a 
sudden chill, lack of rest, or improper diet, cannot fight off the cold. Healthy people 
may catch cold if a large quantity of the cold virus gets into their body. 






. Dusty, poorly ventilated rooms are one of the chief reasons why young children catch so 
many colds at school. These young children don’t know enough to wrap themselves up 
when they go out in the cold. 










IV. It is important to hold an opinion regarding the value of research and the method of paying for 
it. Please compare the following opinions as before. 













. We have already isolated the virus that causes a cold, and antiseptics, gargles and 
sprays are available to kill it. Any additional research that is needed can be left in the 
hands of the large corporations who make and sell the various cold remedies. 










. Our citizens should demand that the government spend much larger sums on research 
into the common cold. With ample funds a cure could soon be developed and the country 
as a whole saved up to $2 billion a year and untold suffering could be avoided. 







. Spending public money for research into the cause of colds is obviously a waste of time. 
It has taken five years to find only a single virus and even after twenty years there is no 
reliable vaccine for influenza. 










. Research into the cause of the common cold has proven to be of value. It has resulted in 
the only step forward against an enemy which costs this country $2 billion a year. More 
money from the government or from private funds should be available to continue this 

important research. 







V. Eating enough vitamins and minerals is necessary for general good health. Since healthy 
people catch fewer colds: 






ou 
A. The best way to avoid colds is to keep the feet dry, exercise regularly and get enough 


sleep. 






n the 






. Vitamins and minerals should be added to the diet if they are lacking, for proper diet is 
essential if a person is to remain free from colds. If a person who gets plenty of vita- 
mins still catches a lot of colds there is a chance that his body is unable to absorb the 

vitamins. 










. Taking vitamins or any other kind of medicine is poor policy since they do no good. Some 
of these medicines are actually harmful and there is no way for an ordinary person to tell 
the good ones from the bad. You can’t trust advertisements. 










. People who get enough vitamins and minerals catch cold less often than other people. To 
be sure of getting enough vitamins and minerals in the diet, everyone should take pills 


rtily 
each day which have all the necessa: y minerals and vitamins in them. 
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You have probably wondered whether or not to call in a doctor when you have a cold. What 
do you think of the following opinions? 


A. A doctor’s visit is probably a waste of his time and your money. 


B. To most people the mere presence of the doctor is of immeasureable help. Although he 
may only act as a moral support, he sets the patient’s mind at rest so that nature can 
effect a recovery. 


. Although the doctor is unable to shorten the duration of a cold, he can see that the patient 
is free from other diseases that may show the same symptoms. He can check the genera] 
condition of the patient and can usually ward off complications that may arise, such as 
pnuemonia or mastoiditis. 


. Of all the body ills, colds take one-third of the doctor's visiting time. Doctor’s time is 
hard to obtain. Therefore, colds are costly. 
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A FUNCTIONAL ITEM ANALYSIS OF 
GROUP PERFORMANCE IN 
GENERAL MATHEMATICS 


GRANT J. NORTHRUP, ROBERT E. PINGRY, A. L. WINSOR 
Cornell University 


This paper proposes to summarize both the 
procedure used and the results obtained from 
the use of a functional item analysis of the re- 
sponses made to the items of the Cornell Math- 
ematics test by 416 freshmen entering the New 
York State College of Agriculture at Cornell 
University in September, 1948. 

The Cornell General Mathematics Test was 
developed at Cornell University as a test of 
proficiency in general mathematics and contains 
forty items arranged in an increasing order of 
difficulty. It is divided into two parts. Part I 
is designed to measure ability in fundamental 
yperations, while Part II tests ability in prob- 
lem solving. With but two exceptions, all the 
items can be solved by a student proficient in 
arithmetic and elementary algebra. 

In its present form, the test embodies the 
results of one revision of the original based up- 
ma previous item analysis which resulted in 
the elimination of some non-discriminating 
items. The test is objective in nature and uses 
the principle of multiple choice with four alter- 
native answers to each item. It is given annu- 
ally to entering freshmen and the results are 
used for purposes of counseling, sectioning, 
andresearch. The test was administered with- 
out time limit to the group from which the ac- 
companying data were compiled. 

The data here presented are based upon the 
following procedures. On the basis of scores 
om the test as a whole, the entire group was 
ranked and divided into quintiles. AnI.B. M. 
graphic item count was then made for each 
quintile group, showing the number of respon- 
ses to each alternative answer to each question 
om the test. From this information, it was then 
possible to tabulate for each group the exact 
number of correct and incorrect responses to 
each test item. From this tabulation it was 
readily observable that for most of the test it- 
ems, the number of correct answers varied 
directly in relation to the rank of the group be- 
ing studied. This is especially to be expected 
since the test had already been subjected to one 
item analysis for the express purpose of in- 
creasing its internal consistency and hence its 
validity. 

For the purposes of this study, the data from 
only three, the low, middle, and high, groups 
of the original five groups were used, and a new 
comparative factor was introduced. 





A careful analysis was made of each item in 
the test for the purpose of determining the fund- 
amental operation or operations involved in its 
solution. As a result of this analysis of the 
forty test items, twenty fundamental operations 
were identified and each test item was listed 
under the operation or operations which were 
judged to be primarily involved in its solution. 
Whenever possible an item was classified un- 
der the one fundamental operation involved. 
This was possible for nineteen items. When 
this could not be done, and when two operations 
seemed to be equally involved, the item wasso 
classified. This was done in eighteen cases. It 
was necessary to classify two items under three 
different operations and one under four opera- 
tions. A miscellaneous classification was set 
up to take care of three problems involving op- 
erations which did not readily fit into other 
classifications and for which new classifications 
did not seem warranted. 

The next step was to tabulate for each of the 
three quintile groups the right and wrong ans- 
wers to classified groups of test items, thus 
maaing it possible to determine the correct 
and incorrect responses, by quintile groups, to 
the test items classified under a particular fun- 
damental operation. Performance by ranked 
groups on functionally classified test items 
could then be observed. It could now be deter- 
mined on which types of problems performance 
was best and where it was poorest. It was still 
possible to compare groups, and by studying 
performance on individual problems under dif- 
ferent classifications, it was possible to form 
many strong inferences as to the exact location 
of specific difficulties. 

These data are shown in the accompanying 
graph (Figure 1) in which all numerals repre- 
sent percentages of correct answers either to 
classified groups of test items, or in a few in- 
stances, to the test as a whole. It is here pos- 
sible to observe general performance, group 
performance, and performance by groups in re- 
spect to fundamental operations in general math- 
ematics. 


Interpretation of Data 





I. The functional item analysis is an aid in 
locating the groups most in need of remed- 
ial help. It provides a basis of compari- 
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Graph showing percent of correct responses to functionally classified test items 
on the Cornell Mathematics Test by ranked groups of entering Freshmen in the 
New York State College of Agriculture. Fall, 1948. 
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son between different levels within a group. 


Since tests of this sort are often used in sec- 
tioning and counseling of students, their value 
in determining those most in need of special 
pelp and remedial work is of prime importance. 
We are enabled through a functional item anal- 
ysis such as this, not only to study the compar- 
ative performance of ranned groups but to be 
specific to the point of determining those par- 
ticular respects in which performance was poor- 
est. We introduce an element of diagnosis. 

A study, for instance, of the performance of 
the low group yields striking evidence of the 
differences in group performance together with 
indications as to where the differences are great- 
est. We find that only thirty-eight percent of 
the responses of this group were correct while 
the three groups as a whole averaged sixty- 
five percent of correct responses. When we 
compare this performance with that of the mid- 
die and high groups, both of which were above 
average of the whole group, we gain an indica- 
tion of relative group performance which raises 
aserious question as to why one group should 
be so far out of line. 

It should be pointed out here that part of the 
answer to this question is to be found in the na- 
ture of the group under study. In the College 
of Agriculture, freshmen are admitted for a 
two- year and a four-year course. While prac- 
tically all of these are high school graduates, 
admission requirements are more flexible for 
the two-year than for the four-year course. 

Distributions of scores were made for 111 
two-year and 263 four-year freshmen taken 
from the sample upon which this analysis was 
made (Figure 2). The effect on the sample of 
the inclusion of the two-year group seems to 
be a piling up of scores at the lower end of the 
scale where the percentage of low scures is 
greatest for the two-year group, although in 
absolute numbers there is not a great differ- 
ence due to the relatively smaller number of 
two-year people in the sample. 

While there is a statistically determinable 
difference between the two groups, there is al- 
so, aS Shown by the distributions, much over- 
lapping. It is, for instance, to be noted that 
7% of the two-year group exceed the mean of 
the four-year group which compares almost 
exactly with the 25% of the two-year students 
who customarily transfer to the four-year 
course on the basis of a 75% average or better 
for the work of the first two years. 

In the light of these facts, it would seem 
safe to conclude that, excepting the fact that 
the range of scores for the four-year group 
goes a little higher, both groups contain indi- 
viduals at comparable levels of performance 
over the entire range of scores and that any 
characteristic of performance observable for 
the group as a whole applies to some individuals 
in either group. 
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We find that the low group performed below 
the average of the whole group in eighteen out 
of twenty classifications of operations, while 
the middle group was below in only seven, and 
the high group in only one. While this is the 
general pattern that should be expected, there 
is evidence of a relatively greater difference 
between the middle and low groups than between 
the middle and high groups. As we have al- 
ready pointed out, part of this difference may 
be attributable to the fact that our sample is 
made up of cases in which the factors of sel- 
ection are not identical. What, however, isit 
in the character of this low group which causes 
it to deviate so markedly in performance from 
the other two? 

Observation of differences in performance 
in particular fundamental operations yields 
further evidence of the inferior performance 
of the low group. Under ‘‘addition and divis- 
ion of fractions, ’’ ‘‘mixed numbers to improp- 
er fractions, ’’ ‘‘multiplication of decimals, ’’ 
‘‘rounding off, ’’ ‘‘signed numbers, ’’ and ‘‘de- 
ductive reasoning’’ we find the middle and up- 
per groups relatively closer together than the 
middle and low groups. Thus we have evidence 
that not only in general performance, but in 
certain particulars of performance, the low 
group is distinctly inferior to the other two to 
an extent greater than might be expected. 

Evidence such as this demands an explana- 
tion. Part of it, as we have already suggested, 
may be in the nature of the sample, however an 
analysis such as this cannot alone provide a 
final answer. A study of the items classified 
under deductive reasoning does, however, sug- 
gest a partial answer. Here the performance 
of the upper and middle groups is well above 
the average of the whole group, while that of 
the low group is distinctly inferior. Two items 
were classified under this function both from 
Part II of the test and of the problem solving 
type. Ninety-three percent of the high group 
and seventy-seven percent of the middle group 
answered these items correctly as compared 
to only twenty-four percent of the low group, a 
percentage of right answers which might have 
been obtained by chance alone. Similarly onall 
of the items classified under ‘‘relative sizes of 
fractions, ’’ and ‘‘solution of literal equations’”’ 
this group got less than twenty-five percent of 
the answers correct. These facts strengthen 
the evidence of marked weakness on the partof 
the low group in those operations which involve 
the seeing of and dealing with relationships. 

There is a suggestion here of a degree of in- 
ability on the part of this group to deal with ab- 
stractions, which when added to general disabil- 
ity in computational skills, as evidenced by gen- 
erally low performance in other operations, 
might account for their marked deficiency in 
general mathematics as measured by the test. 

It is possible that the disability of this group 
is so great as to make remedial work impera- 
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tive or else seriously impede any effort to do 
college work requiring any significant degree of 
ability in mathematics. It is also to be noted. 
that for this low group there is a higher degree 
of variability in performance which may be in- 
terpreted as a further indication of weakness 
for the group as a whole and further evidence 

of the need for further study of its character- 


istics. 


I. The functional item analysis provides the 
teacher or supervisor with a basis for 
remedial instruction and the improvement 
of teaching. 


The functional item analysis is also useful 
in pointing out general areas of weakness and 
can therefore be valuable to the teacher or sup- 
ervisor who wishes to determine the specific 
functions in which students are weak. It is, 
therefore, an aid in providing teachers with a 
basis for remedial instruction. 

There is a problem in an analysis such as 
this in determining what is good and what is 
poor performance in a given operation or func- 
tion. In order to provide a practical basis for 
comparison it was decided to let the average 
performance for all groups on the test as a 
whole serve as the criterion of performance 
and consider performance in a given function 
good or bad when the average of all three 
groups in that function was above or below the 


average of the three groups for the test as a 


whole. This principle of comparison, then, is 
based upon the assumption that specific devia- 
tions from the general average represent dif- 
ferences in performance as far as the function 
under consideration is concerned. 

On this basis performance was unsatisfact- 
ory in the following functions: relative sizes of 
fractions, fractions to decimals, solution of 
literal equations, exponents, percentage, tak- 
ing of square roots, translating words to sym- 
bols, ratio and proportion, and deductive reas- 
oning. It is notable, as we have already sug- 
gested, that these are the functions involving, 
for the most part, the seeing of relationships 
and the ability to think abstractly. 

Over 50% of the group were unable to judge 
correctly the relative value of fractions ex- 
pressed with denominators having exponents, 
nor were they able to express a decimal as a 
fraction, both of which are tests of the ability 
to see relationships. Over 40% of the group 
missed problems which called for the ability to 
solve literal equations, knowledge of the theory 
of exponents, percentage, taking of square 
roots, and translation of words to symbols. 

There is an indication that students can con- 
vert fractions to decimals better than they can 
convert decimals to fractions. Of two items 
classified under ‘‘fractions to decimals’’ eighty- 
four percent could convert a fraction to a dec- 
imal, but only forty percent could convert a 
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decimal to a fraction. 

Seven items were classified under ‘‘percent- 
age.’’ It seems significant that within this 
classification performance was consistently 
lower on those test items in which the use of 
percentage occurred in combination with other 
problem-solving and reasoning functions, and 
best on those items in which the use of percent- 
age was the only function involved. While for 
‘‘ratio and proportion’’ the percent of answers 
correct was only slightly below average, 63%, 
all of these items were of the problem-solving 
type and the results attest to the generally low 
level of performance on this and related func- 
tions. 

These indications of performance inrespect 
to specific functions are parallel to the over- 
all results of the test as a whole which indicate 
generally better performance on Part I than on 
Part II, on computational as compared to prob- 
lem-solving abilities. 

Turning now from a consideration of poor 
performance, in terms of the criterion, toa 
consideration of relatively satisfactory per- 
formance, we find the group performing above 
average in the following functions: multiplica- 
tion, addition, and division of fractions, mixed 
numbers to improper fractions, multiplication 
and division of decimals, rounding off, estimat- 
ing, measurement, signed numbers, and mis- 
cellaneous operations. 

Here is further evidence of the generally 
better performance in those operations involv- 
ing more or less straight computation. It is 
possible only to hazard a guess as to why this 
should be. Perhaps it is a case of primacy— 
that people remember best what they learned 
first. Perhaps it may be evidence of better 
teaching in the elementary than in the secondary 
schools since most of these operations are the 
fundamental computational skills which children 
are taught in the early years of arithmetic. Or 
again, it may merely confirm what is well un- 
derstood by many close observers of teaching 
and learning, that people learn best, and teach- 
ers teach best, the concrete and the definite 
while they find the greatest difficulty with the 
abstract and the relative. 

We can by no means conclude that perform- 
ance is satisfactory on the basis of this evidence. 
It may be relatively better. A study of some of 
the individual test items even within these class- 
ifications reveals further differences indicative 
of the general areas of weakness already pointed 
out. Two items were classified under ‘‘esti- 
mating. ’’ One involved problem solving and 
the other computation. By far the best perfor- 
mance was on the problem involving computa- 
tion only. 

Thus, by comparing group performance in 
the various classified functions it becomes pos- 
sible to identify the general areas of weakness 
both for the particular ranked groups and for 
the group as a whole. It is also possible by in- 
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specting the performance on items under var- 
ious classifications to determine even more 
specifically where performance is poorest and 
where it is best. 

There is, of course, an obvious limitation 
upon the validity of this procedure in that as a 
diagnostic device it is admittedly no better than 
the classification of functions upon which it is 
based. If they are valid, the procedure is val- 
id. It would seem worthwhile to consider the 
possibility of constructing tests along functional 
lines thus setting up a sounder basis for sub- 
sequent functional analysis. It would seem pos- 
sible by such a procedure not only to improve 
tests but to render them more valuable as diag- 
nostic instruments. 


Ill. The functional item analysis is useful in 
improving the reliability and validity of 
the test itself. 


A third respect in which the functional item 
analysis is useful is in determining the validity 
of the test itself. Any analysis of test results 
leads to such questions as: Does the test meas- 
ure what it purports to measure? Do the indi- 
cated differences in performance represent 
true differences in the abilities of the persons 
being tested? Is the test making valid discrim- 
inations between individuals and groups? 

Since in this analysis we are dealing only 
with responses to functionally grouped test it- 
ems, we cannot compare the results of any 
Single item to the results of the test as a whole 
which is necessary if one is interested in deter- 
mining good and bad test items. It is, however, 
possible in this type of analysis to determine 
those categories of items which are most con- 
sistent with the results of the test as a whole, 
thus possibly providing a sounder basis of test 
construction because of the consideration given 
to functional factors. It is also possible to de- 
termine those categories of items which are or 
are not making valid discriminations in respect 
to a specific function. 

It would seem possible to increase the reli- 
ability of a test by constructing it on the basis 
of a functional item analysis. Such analysis 
should be useful in insuring a proper sampling 
of abilities in each function and in determining 
if the results obtained from the items under a 
given classification are contributing their prop- 
er weight to the results of the test as a whole. 

With such considerations as these in mind, 
we find that practically every category of it- 
ems reveals significant differences between 
the three groups, differences which with but 
one or two exceptions could be established sta- 
tistically. We are not, however, as interested 
in the significance of these differences as we 
are in the relativity of them. When we inspect 
these differences in certain categories, we find 
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groups of items definitely marking off the low 
from the middle group, but not as definitely 
the middle from the high. In a few cases there 
is very little discrimination between the middle 
and the high groups. It is possible that a time 
limit, a lengthening of the test, or the construe 
tion of more difficult items might correct this 
weakness in the test, at least for administra- 
tion to this or a similar group. 

To illustrate, the test fails to discriminate 
to any very great extent between the middle and 
high groups, but does discriminate between the 
middle and low groups in the following functions. 
addition and division of fractions, mixed num- 
bers to improper fractions, multiplication of 
decimals, rounding off, and the miscellaneous 
category. These, again, are the computation- 
al skills learned early in arithmetic. The fact 
that the low group performed below the others 
may indicate disability of long standing and the 
test is useful in pointing this out. On the other 
hand, the fact that these items do not clearly 
separate the middle and upper groups is a pos- 
sible indication that for these individuals at 
least, the items are too easy and should either 
be replaced, increased in number, made more 
difficult, or a time limit imposed. 

From another point of view, this constitutes 
evidence that in some respects our upper and 
middle groups have abilities in common not ev- 
idenced in the performance of the low group. 
In certain functions, at least two-thirds of our 
whole group is comparable, while in others 
there are still marked differences. We are, 
therefore, able to determine with more certain- 
ty where our greatest effort at remedial work 
should be placed. _ 

A study of the performance under multiplica- 
tion and division of fractions suggests a possible 
answer to the marked differentiation between 
the low group and the other two. The two oper- 
ations are fundamentally the same, yet the low 
group does much more poorly in division than 
in multiplication of fractions. It is likely that 
the difference is due to a failure to grasp or to 
remember the principle of inversion. It is just 
this type of error that the poorly prepared stu- 
dent is likely to make, and in showing this up 
our analysis gives us a basis for determining 
whether our test is doing what we want it to do. 
The difference may be in preparation. 

The responses to the items classified under 
‘translation of words to symbols’’ suggest that 
they were too difficult for all groups and raise 
the question as to whether items as difficult as 
these apparently were should be included ina 
test to be administered at this level. In re- 
spect to most of the other categories which we 
have not specifically mentioned, we find what 
appears to be good discrimination between all 
three groups. In the light of such evidence our 
analysis lends support to the validity of the it~ 
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ems in these categories, and so far as internal 
consistency is an indication of validity, gives 
ys a reason for confidence in our test. At the 
same time, as we have pointed out, there is 
evidence of certain inconsistencies which may 
well become the basis for improvement of the 


test. 


Summary and Conclusions 





A functional item analysis involves the prin- 
ciple of analyzing performance, on an objective 
test, of ranked groups in respect to functional 
classifications of test items. It may be useful 
to teachers and supervisors in the following 
ways: 


1. It may be an aid in locating for the teach- 
er those most in need of remedial help 
by making possible a comparison of test 
performance at different levels of abil- 
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ity, within the same group. 


. It reveals to the teacher specific areas 


of weakness and provides a basis for 
remedial work. 


3. It is an aid in determining the internal 


consistency of a test and mav be a valu- 
able indication of its validity. 


. By making possible the analysis of test 


results on a functional basis it is a prom- 
ising means of improving both the valid- 
ity and reliability of objective tests. 


. It is a valuable supervisory device be- 


cause it not only permits the analysis 
of group performance at different levels 
of ability but points out particular re- 
spects in which there is group weakness 
in performance. 





